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SYMMETRIC UNCERTAINTY ANALYSIS AND ITS 
IMPLICATIONS FOR PSYCHOLOGY ! 


W. R. GARNER 
Johns Hopkins University 


Information theory, as developed 
primarily by Shannon (8), has had 
considerable impact on psychological 
research. Information theory has 
been used as a model for many differ- 
ent types of behavior; it has alse pro- 
vided psychologists with a statistical 
measure which has many useful prop- 
erties. Frequently the measure is 
used where the term information is 
somewhat inappropriate, and for this 
reason Garner and McGill (3) have 
suggested that the term uncertainty 
be applied to the measure to divorce 
its statistical properties from the con- 
tent implications of the term imforma- 
tion. Even when the mathematical 
properties of the uncertainty measure 
lead to implications for behavior situ- 
ations, information theory does not 
necessarily define the problem; rather, 


1 Parts of this paper were presented at the 
Fifteenth International Congress of Psy- 
chology, Brussels, 1957. This work was sup- 
ported by Contract N5-ori-166, Task Order 
1, between the U. S. Office of Naval Research 
and The Johns Hopkins University. This is 
Report No. 166-I-213, Project Designation 
No. NR 145-089, under that contract. Re- 
production in whole or in part is permitted 
for any purpose of the United States Govern- 
“ment. I wish to acknowledge the valuable 
critical comments which Alphonse Chapanis 
made on the manuscript. 








the mathematical properties of the un- 
certainty measure simply make cer- 
tain relations clear which other statis- 
tical techniques have not been able to 
do. The purpose of this paper is to 
develop some equations of this sort 
involved in uncertainty analysis, and 
to point out the relevance of these 
equations to a few psychological 
problems. 

Two valuable properties of the 
measure of uncertainty, as pointed 
out by Garner and McGill (3), are: 
(a) that it is a nonparametric measure 
which can be applied to any set of 
categorized data, and (b) that the un- 
certainty of a variable can be parti- 
tioned into component parts, much as 
variance is partitioned in analysis of 
variance. A third useful property of 
the uncertainty measure is that an un- 
certainty analysis can be carried out 
in a completely symmetric form, in 
which it is unnecessary to distinguish 
between meanings or functions of 
the several variables involved in the 
analysis. This last property derives 
in part from the nonparametric nature 
of the measure, and is not a property 
of analysis of variance, in which differ- 
ent properties of the criterion and 
predictor variables are assumed. 
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ANALYSIS OF THREE VARIABLES 


For illustration purposes, let us con- 
sider first a three-dimensional matrix 
of data. Figure 1 shows schematic- 
ally such a data array and some of 
the terms which we will be using. 
The three variables are w, x, and y, 
and they can assume values w,, x;, and 
ye. Each cell in the matrix originally 
contains a number which represents 
the frequency of occurrence of cases 
having that particular combination of 
values of w, x, and y. For our pur- 
poses, we have shown the matrix in 
which these frequencies have been 
transformed into proportions, p(7,7,k), 
by dividing the cell frequencies by the 
total number of cases in the matrix. 
This matrix can be collapsed across 
any one variable to form three differ- 
ent two-variable matrices, or it can be 
collapsed across two variables to give 
distributions for one variable at a 


























time. When two-variable matrices 
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Fic. 1. Symbolic representation of a 
three-variable matrix of data. Three vari- 


ables, w, x, y, have category values w,;, x;, and 
yx. Each cell in the matrix originally has a 
frequency which is divided by the total num- 
ber of cases in the matrix to give the propor- 
tion p(i,7,k). Summing these proportions 
across any variable gives two-variable mat- 
rices with cell entries of p(i,k), p(j,k), or 
p(i,j) as indicated on the three surfaces. 
Summing across two variables gives the 
marginal proportions p(7), p(j), and p(k). 
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are used, the proportions in the cells 
are still computed as the frequency in 
a cell divided By the total number of 
cases, and can be obtained by sum- 
ming the proportions across the col- 
lapsed variable. 

Unidirectional analysis. When un- 
certainty is partitioned as in analysis 
of variance (3, 4), we start by identify- 
ing one of the variables, let us say y, 
as the criterion variable and the other 
two, w and x, as predictor variables. 
Now our problem is to partition the 
uncertainty of y into its component 
parts. The basic partitioning equa- 
tion is 


U(y) = U(y:w,x) + Une(y) [1] 


where U(y) is the uncertainty of the 
y-distribution, based on p(k); U(y: 
w,x) is the total uncertainty in y 
which is predictable from w and x, 
and it is called a multiple contingent 
uncertainty ; and U,:(y) is the residual 
or error uncertainty.? This last term 
is a conditional uncertainty, and is 
computed by taking one combination 
of w and x at a time, determining the 
uncertainty of the y distribution, and 
then obtaining a weighted average 
for all of the w,x cells. 

By transposing terms in Equation 
1, we can define the multiple contin- 
gent uncertainty as 


U(y:w,x) - U(y) a. Uwz(y) [2] 


The right hand side of this equation 
can then be rewritten to show the 
complete partitioning of the predict- 
able uncertainty as 


U(y:w,x) = U(y:w) 
+ U(y:x) + U(y:wx) [2a] 


2 For many of the equations presented here 
we shall not present calculational formulas or 
complete proofs, since these are available in 
at least a related form in other articles (es- 
pecially 2, 3, 4, 5). 
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The three terms on the right of Equa- 
tion 2a represent, in order, the uncer- 
tainty in y which is predictable from 
w, that which is predictable from x, 
and that which is preditable from 
unique combinations of wand x. The 
first two terms are simple contingent 
uncertainties, and are normally com- 
puted from collapsed two-variable 
matrices. The last term is an inter- 
action uncertainty, and is computed 
either as a residual or as the difference 
between a simple contingent uncer- 
tainty and the same contingent un- 
certainty computed with the third 
variable held constant, rather than 
with the matrix collapsed over that 
variable. (See 3, 4.) 

The partitioned uncertainties on 
the right of Equation 2a are com- 
pletely analogous to partitioned vari- 
ances in analysis of variance, and 
would in that case be called the main 
effect due to w, the main effect due to 
x, and the interaction between w and 
x. 

Symmetric analysis. This approach 
to uncertainty analysis has the ad- 
vantage of being completely analogous 
to the more familiar analysis of vari- 
ance. It is possible, however, to ap- 
proach uncertainty analysis from an- 
other point of view which allows us to 
deal with the total matrix of data asa 
unit. We can call this type a sym- 
metric uncertainty analysis to dis- 
tinguish it from the unidirectional 
analysis in which the uncertainty of 
just one variable is partitioned. 

Since uncertainty analysis is non- 
metric, we can look at the three-di- 
mensional matrix shown in Fig. 1 as a 
single frequency distribution in which 
we have categories and proportions of 
total cases falling in each category. 
Thus we can compute the uncertainty 
of the total matrix of data, 


U(w,x,y) = — S{[pi,7,k)] . 
X Llogep(i,7,k)]} [3] 
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where U(w,x,y) is the total uncer- 
tainty in the three-dimensional ma- 
trix, and p(1,j,k) is as defined above. 
Now this total uncertainty can be 
broken down in many different ways, 
some of them being 

U (w,x,y) 

U(y) + U,(x) + U2, (w) 
U(w) + Uw(x) + Uwe(y) 
U(w,x) + Uwe(y) 
U(wy) + Uny(x) 

U(x,y) + Uz, (w) 


[4a ] 
[4b ] 
[4c ] 
[4d ] 
[4e] 


For these equations, the only terms 
needing additional explanation are 
U,(x) or equivalent forms, and U’(w,x) 
or equivalent forms. Any term of the 
form U,(x) is a conditional uncer- 
tainty, and is computed by taking one 
value at a time of the subscripted 
variable, determining the uncertainty 
of the variable in parentheses, then 
obtaining a weighted average over the 
subscripted variable. For such a cal- 
culation, the matrix is collapsed over 
the third variable. A term of the 
form U(w,x) is a total uncertainty for 
a two-dimensional matrix, i.e., one 
collapsed over the third variable. It 
is computed from the (7,7), in this 
case, or the appropriate terms from 
any other combination of two vari- 
ables. 

U(w,x,y) is a measure of the total 
uncertainty actually obtained in the 
matrix. We can also determine the 
maximum total uncertainty which 
could have been obtained from the 
matrix by determining what propor- 
tion of cases would fall in each cell if 
there were no contingencies between 
any of the variables. This ideal 
proportion, P(7,j,k), is 


P(i,j,k) = pli)p@)ptk) [5] 


in much the same way as is done for 
chi square. From these ideal propor- 


II 


Il 
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tions, representing those which would 
be obtained in a matrix with all vari- 
ables completely independent, we can 
compute the maximum uncertainty 
in the matrix, Umax(w,x,y), as 


Umax(w,x,y) = — D{[p(i)p(s) p(k) 
X [Llogep(1)p(j)p(k) J} [6] 


This maximum uncertainty can itself 
be partitioned into components, and in 
this case the partitioning results in a 
simple set of terms. 


U (w) 
+ U(x) + U(y) 


Umax(w,x,y) = 
(7) 


In other words, the maximum uncer- 
tainty which can be obtained in a 
matrix which is restricted only by the 
marginal totals is the sum of the un- 
certainties for the three variables 
taken one at a time. 

Now if we have the maximum un- 
certainty which can be obtained in a 
matrix, and the uncertainty which 
actually is obtained, the difference 
between them must give us a measure 
of the total contingency or interre- 
latedness in the three-dimensional 
matrix. We can define a measure of 
this total contingency as 
U(w:x:y) = Umax(w,x,y) 

ail U (w,x,y) [8] 
The term on the left is the total con- 
tingent uncertainty in the matrix, and 
the colon notation has been used 
as for other contingent uncertainty 
terms. It should be noted, however, 
that there is no implication of direc- 
tion in the notation; the three sym- 
bols for the three variables can be 
written in any order. 

The total contingent uncertainty 
can be partitioned also, and the use 
of Equation 7 with Equations 4c, 
4d, or 4e leads to three different forms 
of the partitioning. 
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U(w:x:y) 
= U(w:x) + U(y:w,x) [9a] 
= U(w:y) + U(x:w,y) [9b] 
= U(x:y) + U(w:x,y) [9c] 


All terms in these equations have been 
defined above, and need no further 
clarification here. These equations 
show that the total contingent uncer- 
tainty can be partitioned into a multi- 
ple contingent uncertainty in which 
one variable is predicted from the 
other two, plus a simple contingent 
uncertainty between the two vari- 
ables used as predictors, and it makes 
no difference which variable is treated 
as a criterion and which as predictors 
for this statement to be true. 

As in Equation 2a, each of the three 
multiple contingent uncertainties can 
itself be partitioned into components, 
so that the total contingent uncer- 
tainty can be partitioned more com- 
pletely. 


U(w:x:y) = U(w:x) + U(wy) 
+ U(x:y) + U(y:@x) 

= U(w:x) + U(w:y) 
+ U(x:y) + U(x: wy) 

U(w:x) + U(w:y) 
+ U(x:y) + U(w:xy) 


Thus the total contingent uncertainty 
consists of the three simple contingent 
uncertainties involving the three pairs 
of predictors, plus the interaction 
term. Although the three interaction 
terms have been written differently 
for each of the above equations, it is 
clear from these equations that all 
three interaction uncertainties are 
equal. In other words, there exists in 


[10a ] 


[10b ] 


[10c } 


the matrix all of the two-variable con- 
tingencies, plus a single interaction 
term which truly represents a three- 
dimensional pattern of inconsistencies. 
For this reason, it would be better to 
write the interaction uncertainty as 
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U(wxy) to make clear the uniqueness 
of that term for the given matrix. 

We can transpose the terms in 
Equation 9 to see the relation of each 
of the three possible multiple contin- 
gent uncertainties to the total con- 
tingent uncertainty, and we obtain 


U(y:w,x) = U(w:x:y) 

— U(w:x) [ila] 
U(x:w,y) = U(w:x:y) 

— U(w:y) [iib] 
U(w:x,y) = U(w:x:y) 

— U(x:y) [lic] 


With the equations written in this 
form, some interesting relations be- 
tween variables become clear. Each 
of these equations shows the multiple 
contingent uncertainty for predicting 
one of the three variables from the 
other two. The form of the equations 
is such that the amount of prediction 
available for each variable is equal to 
a constant less a variable term, which 
is in each case the contingent uncer- 
tainty between the two predictor vari- 
ables. In other words, the amount of 
predictable uncertainty available for 
any variable in the matrix is the total 
contingent uncertainty less the con- 
tingent uncertainty between the two 
variables from which the predicting is 
being done. Therefore, the amount 
of prediction available in any direction 
in the matrix is inversely related to 
the contingency between the two pre- 
dictors. For a given matrix the max- 
imum possible predictability will be 
obtained when the predictor variables 
are orthogonal, i.e., have no contingent 
uncertainty between them. This 
seems intuitively sensible, and the 
uncertainty analysis presented here 
makes it clear that for any set of data 
this relation must hold. 

In order to compute the exact 
amount of a multiple contingent un- 
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certainty involving one criterion and 
two predictor variables, it is necessary 
to carry out calculations which in- 
volve the three-dimensional matrix. 
For many sets of data, such a calcula- 
tion would be extremely laborious, if 
not prohibitive. Equation 11 shows 
that it is possible to determine how 
much better prediction is for one 
variable than for another without 
actually carrying out a complete three- 
dimensional calculation. The amount 
by which one of the three multiple 
contingent uncertainties is greater 
than another is exactly the same as the 
difference in the contingent uncer- 
tainties between the two sets of pre- 


dictor variables, and these uncer- 
tainties can be calculated with the 
three-dimensional matrix collapsed 


to two dimensions. Actually, the 
interaction term is the one which re- 
quires a three-dimensional calculation 
to determine all the components of the 
multiple contingent uncertainty (see 
Equation 2a), and the essential fact 
which makes it possible to determine 
exact differences between multiple 
contingent uncertainties without the 
complete calculation is that the inter- 
action uncertainty, as mentioned 
above, is the same for all three cri- 
terion variables. It is truly a term 
which applies to the complete matrix 
of data, and is not uniquely deter- 
mined for each variable used as a 
criterion, as it would appear to be from 
a unidirectional partitioning approach 
to uncertainty analysis. 


HIGHER-ORDER MATRICES 


The entire discussion so far has 
been concerned with three-dimen- 
sional matrices. It is necessary to 
use at least three to show the nature 
of the principles, and it is easier to use 
just three to avoid undue complica- 
tion. However, a brief discussion of 
at least the four-variable case will 
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illustrate some additidnal factors. 
We noted above that it is necessary 
to have two-dimensional uncertainty 
calculations in order to determine 
the differences between the various 
three-dimensional, multiple contin- 
gent uncertainties. In other words, 
calculations with as many variables as 
there are in the matrix are necessary 
to obtain absolute numbers for the 
multiple contingent uncertainties, but 
calculations with one less variable 
suffice to determine the differences 
between the multiple terms. A simi- 
lar situation exists for the general case, 
as is shown by the following equations, 
in which » is the fourth variable: 


U(v:w:x:y) 
— U(v:w:x) [12a] 
U(o:w:x:y) 
— U(v:w:y) 


U(y:0,w,x) = 


U (x:0,w,y) 
[12b] 
U(v:w:x:y) 

— U(u:x:y) 


U (w:v,x,y) 
[12c] 
U(v:w:x:y) 

— U(w:x:y) 


U(v:w,x,y) 


[12d] 


Each of the four multiple contin- 
gent uncertainties is the total contin- 
gent uncertainty in the four-variable 
matrix, U(v:w:x:y), less the total 
contingent uncertainty between the 
three variables used as predictors, and 
exact differences between the multi- 
ples can be obtained with three-vari- 
able calculations. Similar conclu- 
sions concerning the variable whose 
prediction will be maximum can be 
made, since any variable which can be 
predicted from orthogonal predictor 
variables will have all of the contin- 
gent uncertainty in the matrix avail- 
able for prediction of that variable. 

Many situations exist, of course, 
where the data matrix has four or more 
variables, but where it is impractical 
or impossible to compute interaction 
terms involving even three variables. 
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We noted above that when only three 
variables are involved, exact differ- 
ences between the multiples can be 
determined because there is only one 
interaction term for the entire matrix. 
When four or more variables are in- 
volved, however, the interaction terms 
involved in each multiple contingent 
uncertainty are not identical. For 
example, with four variables, each 
multiple contingency requires four 
interaction terms, three involving 
three terms, and one involving all 
four terms. For any pair of multiple 
contingent uncertainties, three of 
these four interaction terms are identi- 
cal, but the fourth will normally be 
different. As an example, the com- 
plete set of terms involved in two of 
the four multiple contingent uncer- 
tainties are 


U(y:v,w,x) = U(y:v) + U(y:w) 
+ U(y:x) + U(vwy) + U(oxy) 
+ U(wxy) + U(vwxy) [13a] 
U(x:0,w,y) = U(x:0) + U(x:w) 
+ U(x:y) + U(vwx) + U(exy) 
+ U(wxy) + U(ewey) [13b] 


where U(vwxy) is the four-variable 
interaction. Notice that only the 
terms U(vwx) and U(vwy) are differ- 
ent interactions for these two multi- 
ples, and that the other three inter- 
actions in each case are identical. 
This fact means that estimated dif- 
ferences between the various multiple 
contingent uncertainties can be ob- 
tained with little error simply by add- 
ing up the two-variable contingent 
uncertainties involved in each case. 
The total difference in the interaction 
uncertainties must be small, since only 
one of four terms is different. All 
four interaction terms, of course, are 
needed to determine the absolute 
magnitude of the multiple contingent 
uncertainty, and, therefore, more error 
in estimating the absolute magnitude 
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is likely in that case. When even 
more variables are included, the ap- 
proximation by adding up the two- 
variable contingent uncertainties is 
more risky, since many more inter- 
action terms are involved. However, 
it should be remembered (3, 4, 5) that 
interaction terms can take negative as 
well as postive values, so that it is 
often still reasonable to assume that 
the sum of all the interaction terms 
is negligible, even though any one of 
them may not be. 


PSYCHOLOGICAL APPLICATIONS 


The mathematical developments 
briefly described in the preceding 
sections have many interesting appli- 
cations and implications in psycho- 
logical research. In some cases, their 
primary value is to clarify the mean- 
ing of computations which are com- 
monly carried out with the uncertainty 
measure. In other cases, these equa- 
tions lead to some quite specific pre- 
dictions about behavior which is re- 
lated to nonrandom sequential events. 
A few of these implications will be 
discussed as illustrative cases. 


Multivariate Information Transmission 


One common application of infor- 
mation theory in psychology has been 
the use of information-transmission 
measures to determine the ‘‘channel 
capacity” of the human observer in 
transmitting, verbally or otherwise, 
information about a series of events or 
stimuli. Following the paradigm de- 
scribed by Garner and Hake (2), a 
typical experiment requires subjects 
to make an absolute response to a 
series of stimuli varying along some 
dimension, e.g., brightness or loud- 
ness, and the experimental result is 
expressed as an amount of informa- 
tion transmitted between stimuli and 
If the information trans- 


responses. 
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mitted is less than the information 
contained in the stimuli and in the re- 
sponses, it is assumed to represent a 
channel capacity for the observer. In 
the more general terminology of the 
present paper, information transmis- 
sion is a two-variable contingent un- 
certainty. 

Although many experiments have 
been done using this approach, for 
illustrative purposes we will refer to 
one by Garner (1). In this experi- 
ment, observers were asked to make 
an identifying response on a twenty- 
point scale to twenty different inten- 
sities of atone. From the two-dimen- 
sional matrix involving just stimuliand 
responses, the contingent uncertainty 
between stimuli and responses is com- 
puted, and this figure can be inter- 
preted as information transmitted. 
For this two-variable case, it makes 
no difference whether we state that 
information has been transmitted 
from stimuli to reponses or vice versa, 
or whether we talk about the amount 
of uncertainty in stimuli which can 
be predicted from responses, or vice 
versa. The measure of information 
transmission is completely bilateral, 
and can be given interpretation in 
either direction. 

Additional computations were made 
in this experiment, however, to allow 
multiple predictions. For example, 
the multiple contingent uncertainty 
for predicting the response from know]- 
edge of the observer and of the stimu- 
lus was calculated, and this multiple 
contingent uncertainty was, naturally, 
larger than the simple contingent un- 
certainty. It seems quite reasonable 
that if more variables are isolated for 
prediction purposes, more prediction 
should be available. 

However, Equation 11 shows that, 
once more than one variable is used 
as a predictor, the bilateral interpre- 
tation of the contingent uncertainty 
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is no longer possible. Let us look at 
this over-all situation as a three-vari- 
able matrix involving stimuli (S), 
responses (R), and observers (Q). 
Now the multiple contingent uncer- 
tainty for predicting R from S and O 
is the total contingent uncertainty in 
the matrix minus the contingent un- 
certainty between S and O, the 
two predictor variables. In symbolic 
form, 
U(R:S,0) = U(R:S:0) 
— U(S:0) [14a] 

In this experiment, however, as in 
most such experiments, all observers 
received the same simuli. Therefore, 
there is orthogonality between S and 
O, and the contingent uncertainty 
between them will be zero. Thus 
the contingent uncertainty available 
for predicting R is the same as the 
total contingent uncertainty in the 
three-dimensional matrix, and is at 
a maximum for this particular matrix. 

Suppose now that we decide to 
predict stimuli from responses and 
observers—the type of prediction 
which is more realistic, since in the 
ordinary situation the response is 
transmitted, and from it and knowl- 
edge of the observer we make infer- 
ences or predictions about the stimu- 
lus. In this case, in symbolic form, 


U(S:R,O) = U(R:S:0) 

— U(R:O) [14b] 
Again the multiple contingent uncer- 
tainty is the total contingent uncer- 
tainty in the matrix minus the con- 
tingent uncertainty between the two 
predictors, Rand O. But in this case 
there is no reason why R and O should 
be orthogonal, and, in fact, if they are, 
there is little point to adding O as 
a predictor variable. Since U(R:O) 
will be real and positive, prediction of 
S from R and O must be less than pre- 
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diction of R from S and O, as in the 
former case. 

These relations do not indicate that 
no improvement in prediction of S can 
be obtained by making use of knowl- 
edge about O, but only that prediction 
of S cannot be as great as prediction 
of R. If we consider only the predic- 
tion of S, then prediction from both R 
and from O can be broken down as 
follows: 


U(S:R,O) = U(S:R) + U(S:0) 
+ U(SRO) [15] 


The third term on the right is the 
interaction. This equation shows the 
problem clearly, since no prediction of 
S can be obtained from just O as long 
as S and O are orthogonal. Addi- 
tional prediction may, however, be 
obtained from the interaction term, 
which is to say that only inconsisten- 
cies in relations involving observers 
are useful in predicting S. For pre- 
dicting R, however, both inconsistent 
and systematic differences by O in use 
of the R continuum provide additional 
prediction. 

A simple example will illustrate 
these relations. Suppose that two 
observers rate two stimuli with per- 
fect consistency, but one observer 
uses the numbers “‘1”’ and ‘‘2”’, while 
the other uses the numbers ‘3’ and 
“4”, for the first and second stimuli, 
respectively. Now if we know the re- 
sponse we can perfectly predict the 
stimulus, and it is no help to us to 
know which observer uses which re- 
sponses. If we use the stimulus to 
predict the response, however, we 
must first know which observer is in- 
volved to know which response is 
used. On the other hand, suppose 
that both observers use the numbers 
“1” and ‘2”, but in reverse relation 
to the two stimuli. Knowing the 
stimulus alone is now of no value in 
predicting the reponse, or vice versa. 
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However, if knowledge of the observer 
is added, then perfect prediction of 
the response from the stimulus or of 
the stimulus from the response is 
possible, due to the fact that the 
addition of observers as a predictor 
variable adds an interaction term in 
which all of the available prediction 
lies. 

“As a concrete example, these two 
cases are extreme and serve only to 
illustrate the principles. One fact 
remains, however, which is that, in 
such a case, prediction of stimuli can 
never be better than prediction of re- 
sponses, and will normally be some- 
what poorer. 

This type of problem can be carried 
even further. In the same experi- 
ment, Garner actually determined the 
multiple contingent uncertainty when 
three predictor variables were used, 
the fourth variable being the stimulus 
which preceded the one being judged 
(P). Furthermore, in the experi- 
mental design, all pairs of stimuli were 
made to occur equally often, so that 
P and S were orthogonally related. 
Now the two types of prediction are 


U(R:S,0,P) = U(R:S:0:P) 

- U(S:0:P) [16a] 
and 
U(S:R,0,P) = U(R:S:0:P) 

— U(R:0:P) [16b] 


following the form of Equation 12. 
In this case, when R is predicted from 
S, O, and P, the multiple contingent 
uncertainty is the total contingent 
uncertainty in the four-variable ma- 
trix minus the total contingent uncer- 
tainty in the three-variable matrix in- 
volving the three predictor variables. 
Since these three predictor variables 
were all orthogonal in the experimen- 
tal design, this term is zero, and this 
multiple contingent uncertainty con- 
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tains all of the contingent uncertainty 
available in the matrix. 

When, however, S is predicted from 
the other three variables, the orthog- 
onal condition does not hold, since 
of the three variables only O and P 
are orthogonal. This term will thus 
be positive, and will be fairly sub- 
stantial, since it includes contingent 
uncertainties between R and O, and 
between R and P, as well as the three- 
variable interaction which must (in 
this case) also be positive. Thus, 
there would be substantial error in 
assuming that these two multiple 
contingent uncertainties are the 
same. 


Redundancy of Printed English 


One common application of infor- 
mation theory and uncertainty analysis 
is to situations where events occur in 
a sequence which is not completely 
random, i.e., in which there are se- 
quential dependencies. Much of hu- 
man behavior isconcerned with making 
predictions about events which are 
unknown, or only partially known, on 
the basis of sequentially provided in- 
formation. Language is a very good 
example of such an event series in 
which there exist sequential depend- 
encies, and several studies have been 
done on printed English to determine 
how much predictability is available 
from sequences of letters. The term 
“redundancy” has been used to de- 
scribe the amount of uncertainty in a 
given letter which is predictable on the 
basis of other known letters. The 
equations described above point out 
some interesting aspects of redundancy 
of printed English. 

When we do an analysis of se- 
quences of letters, we form a data 
matrix in which the series of letters 
itself is one variable, and the series 
displaced by one or more units consti- 
tute the other variables. For present 
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purposes we shall refer to the first 
order series as variable w, the series 
displaced one step as variable x, and 
the series displaced two steps as vari- 
able y, and we will be dealing with 
a three-dimensional matrix. In this 
special three-variable case there are 
some restrictions on the contingent 
uncertainties which exist. For ex- 
ample, we know that the term U(w:x) 
is identical to the term U(x: y), since 
in both cases we are describing the 
contingent uncertainty for a one-step 
displacement of the series. In ad- 
dition, the contingent uncertainty 
U'(w: y) is less than either of the others, 
since that is the contingency for a 
two-step displacement, and a letter 
cannot be predicted as well from one 
two steps behind as from the immedi- 
ately preceding one. 

Now we know from Equation 11 
that the multiple contingent uncer- 
tainty for predicting x, the middle 
letter of a series of three, must be 
greater than that for predicting either 
w or y from the other two, since when 
x is being predicted, the contingent 
uncertainty between the two predictor 
variables, w and y, is less than the 
contingent uncertainties between the 
other pairs of predictor variables. In 
other words, prediction of the middle 
letter of a sequence of three must be 
better than prediction of either end of 
the sequence on the basis of the other 
two. Similar relations hold for longer 
sequences of letters, and we can make 
the general statement that if any 
fixed number of letters is available for 
prediction, the best prediction will 
occur if the letter being predicted is in 
the middle of the total sequence rather 
than at either end. Miller and Fried- 
man (6) have experimentally shown 
this to be true for sequences of 5, 7, 
and 11 letters. 

Newman and Gerstman (7) have 
provided data which allow us to see 
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just how much better prediction of 
middle letters should be, compared to 
prediction of end letters. Their data 
show, for the three-letter case, that 
the contingent uncertainty for a one- 
step displacement is 0.91 bit, and for 
a two-step displacement it is 0.42 bit. 
This difference is 0.49 bit, and refer- 
ence to Equation 11 shows that this is 
the exact amount by which prediction 
of the middle of three letters will be 
better than prediction of either end. 
This is a substantial difference. 

As pointed out above, the situation 
becomes more complicated when we 
deal with longer letter sequences be- 
cause of the greater number of inter- 
action terms. However, we can as- 
sume that these terms are negligible, 
and determine the multiple contingent 
uncertainty for predicting omitted 
letters from other length sequences 
simply by adding up the various 
two-term contingent uncertainties in- 
volved in the multiple contingent 
uncertainty. Using again the New- 
man and Gerstman data, for an 
eleven-letter sequence, the contingent 
uncertainty for predicting either end 
letter on the basis of the other ten is 
2.17 bits, while that for predicting the 
middle letter, on the basis of five 
letters on each side, is 3.72 bits. For 
seven-letter sequences, prediction of 
the end letters gives a multiple contin- 
gent uncertainty of 1.95 bits, while 
prediction of the middle letter gives 
3.18 bits. These figures compare 
quite well with those found by Miller 
and Friedman, although the agree- 
ment is better for middle letters than 
for ends. It seems, from their data, 
that humans make more effective use 
of information presented symmetri- 
cally than they do of unilateral infor- 
mation, since their data show higher 
residual uncertainties for the ends than 
these calculations predict. 

These relations point out the need 
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for a more accurate interpretation of 
the meaning of the term redundancy. 
This term has usually been defined for 
prediction of letters on the basis of 
preceding letters only, and the figure 
of 50 per cent redundancy is accepted 
as a reasonable approximation. Such 
a figure, however, can be interpreted 
quantitatively and accurately only 
when it is applied to this particular 
prediction situation. We cannot infer 
from it how many letters in printed 
English can be omitted and still allow 
perfect reconstruction. Clearly, with 
a figure of 50 per cent redundancy for 
one direction only, we cannot expect 
perfect reproducibility unless we have 
this amount available on both sides 
of the item being predicted. In other 
words, we need 100 per cent redun- 
dancy for perfect prediction, and this 
can be obtained only when a single 
letter is omitted from a very long se- 
quence, so that maximum redundancy 
is available on both sides of the letter 
being predicted. Perhaps our think- 
ing on this problem has been limited 
by our natural tendency to deal with 
antecedent and consequent events; 
but we can think of both antecedent 
and subsequent events as determining 
the consequent events, and combina- 
tions of these certainly are more 
effective than either alone. 


Response to a Continuing Series 


A type of behavior for which the re- 
lations shown here have important 
implications is that in which a contin- 
uing series of events occurs and a hu- 


man must make a differential re- 
sponse to each event as it occurs. 
Such tasks as copying telegraphic 
code, taking dictation, reporting tar- 
get positions on a radar, and target 
tracking are all of this type. 

To talk about a simplified task of 
this sort, let us assume that an opera- 
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tor is read a series of numbers and he 
is required to push a different button 
for each different number he hears, 
much as he would if he were adding 
up a column of figures on a machine 
as they were being read to him. The 
operator, of course, is trying to carry 
out his task with minimum error, so 
that he wants to get all the informa- 
tion possible about what number was 
read to him. Initially, let us suppose 
also that the numbers do not occur in 
a completely random order, but that 
each number is to some extent pre- 
dictable from the preceding ones in a 
fairly simple way. Such a series, of 
course, will show sequential depend- 
encies, and, with rare exceptions, the 
contingent uncertainties between the 
series and its displacements will be 
largest for adjacent numbers and will 
decrease as the number series is dis- 
placed from itself more and more. 
Now we know from therelationsshown 
in this paper that the maximum pre- 
dictability for any single number will 
be obtained if the operator is able to 
use symmetrical prediction of it, i.e., 
to use both forward and backward pre- 
diction. In order to use both forward 
and backward prediction, however, 
he must delay his response to a partic- 
ular number until he has heard some 
subsequent numbers. He must, in 
other words, lag behind the presented 
series. This is, of course, a phenom- 
enon commonly observed in everyday 
life. 

Suppose we want to determine the 
optimum lag which such an operator 
should use. We can make some rea- 
sonable predictions, but these pre- 
dictions will depend on a number of 
factors. 

1. The memory function. It is clear 
that the operator cannot delay his re- 
sponse so long that he cannot remem- 
ber all of the items which are poten- 
tially useful for prediction of any one 
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item. Let us assume that a reason- 
able memory span for such a task is 
five units. Now if all five units can 
be remembered equally well, the 
optimum lag for the subject would be 
two units behind the one to which the 
response is being made, since this lag 
will give two units on either side of 
the one in question, with the optimum 
of completely symmetrical prediction. 

Actually, it is unrealistic to assume 
that all of the items can be remem- 
bered equally well. It seems more 
reasonable to assume that the items 
last heard will be more clearly remem- 
bered, and thus that the contingent 
uncertainty available from them will 
be more useful. We can illustrate the 
effects of such a memory function by 
using a weighting function which gives 
greater value to recent items than to 
remote items. Table 1 contains a set 
of purely hypothetical numbers to 
illustrate how such a memory factor 
would affect the optimum lag. The 
first column shows the unit step for 
each item, with ‘/O”’ being the item to 
which the response is to be made, 
negative numbers referring to items 
which occurred before that one, and 
positive numbers being the later, more 
recent items. Column 2 shows the 


contingent uncertainty for each dis- 
placement with respect to the “O” 
item. Column 3 shows the weights 
which have been assigned to each item 
being used for predictive purposes, 
with the largest weight for the most 
recent item. Column 4 shows the 
resultant weighted value of the con- 
tingent uncertainty for each step, 
with the total at the bottom. For 
these two columns, we have assumed a 
lag of two steps. The next two col- 
umns show the same calculations for a 
lag of three steps, and the last two 
columns for a lag of one step. 

With an equiweighted function, the 
maximum multiple contingent uncer- 
tainty occurs with a lag of two, and 
has a value of 3.00 bits (i.e, all 
weights are 1.0). When the differen- 
tial weights are used, however, the 
maximum-weighted contingent uncer- 
tainty occurs with a lag of one, and 
has a value of 1.86 bits. Thus, if we 
assume a memory differential for the 
various steps, two things happen: the 
total contingent uncertainty decreases, 
and the optimum lag also decreases. 
In general, where responses are made 
to a continued time series, we would 
expect the optimum lag to be some- 
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what less than that which gives com- 
plete symmetry to the predictors. 

2. The amount of redundancy. A 
second factor which affects the opti- 
mum lag is the total amount of redun- 
dancy in the series, as well as its 
distribution over the various steps. 
This point need not be labored. A 
lag which gains no predictability can 
be of no value. Clearly if the series is 
completely random (no redundancy), 
there is no advantage to the operator 
in lagging because he gains no redun- 
dancy or information when he does. 
Similarly, if only one item on either 
side gives any predictability, then no 
lag beyond one step is ever of any 
value. 

3. The amount of noise. One last 
factor which will affect the optimum 
lag for the operator is the amount of 
noise whose effects have to be offset. 
In our illustration, the term noise can 
refer to acoustical noise, or in the more 


general case, it can refer to any factor 
which decreases the probability of a 


correct perception of the item. The 
importance of noise is that it deter- 
mines the extent to which the operator 
needs the predictability which he can 
obtain from the series redundancy. 
Suppose, for example, that the task is 
being carried out under ideal circum- 
stances, where each number can be 
heard quite clearly. Under these cir- 
cumstances, the operator can respond 
immediately with essentially perfect 
accuracy, and it would make little 
difference to him whether the series is 
partially redundant or not. In either 
case he would not lag, since there is 
nothing to be gained by it. If, on 
the other hand, there is sufficient 
noise to make the perception imper- 
fect, then the operator will frequently 
find himself unsure of the proper re- 
sponse, and will need some additional 
redundancy to improve the probabil- 
ity of his making a correct response. 
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Thus, in summary, we can assume 
that with this type of task operators 
will lag if there is some noise in the 
situation and if the series has sequen- 
tial dependencies in it. If these con- 
ditions are satisfied, then the exact 
amount of the lag will depend on his 
memory for the items presented and 
on the distribution of the redundancy 
in the series of events. 


SUMMARY 


In this paper we have presented 
some equations originating in informa- 
tion theory which show some valuable 
properties of uncertainty analysis. 
These equations are presented in a 
form which emphasizes the fact that 
an uncertainty analysis can be carried 
out so that all variables involved can 
be dealt with in a completely sym- 
metric manner. It is possible, for 
example, to talk about a single num- 
ber which represents the total contin- 
gent uncertainty in a multidimen- 
sional matrix, and then to demonstrate 
how parts or all of this contingent un- 
certainty can be made available for 
prediction of just one of the variables. 
From such equations it is possible to 
demonstrate certain relations which 
must hold between prediction of differ- 
ent variables in a single matrix. 

These relations have important 
implications for several areas of psy- 
chology, either in clarifying the mean- 
ing of terms used in statistical analy- 
sis or in showing how behavior ought 
to occur if it is to maximize the use of 
information. As one illustration, it is 
shown that information transmission 
in the multidimensional case cannot 
be as good when stimuli are predicted 
as when responses are predicted. 
Another illustration shows that the 
usual measure of redundancy of 
printed English can be interpreted 
accurately only when prediction is 
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made unilaterally, but that maximum 
predictability of single letters occurs 


when prediction (on the basis of 
redundancy) is allowed to operate in 
both the forward and backward di- 
rections. A third illustration shows 
that when human operators have to 
respond to a continuing series which 
is nonrandom, optimum performance 
(with an error criterion) will occur 
only when the operator lags behind 
the series to which he is responding; 
furthermore, the optimum lag will be 
a function of the memory of the oper- 
ator, the amount of noise in the situa- 
tion, and the distribution of redun- 
dancy throughout the series. 

These examples are in ‘no sense in- 
clusive, and the implications of equa- 
tions presented here range into many 
areas of psychology. These illustra- 
tions are sufficient, however, to point 
out several possible directions for re- 
search, and to clarify some concepts 
now in use in psychology. 
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HEREDITY, ENVIRONMENT, AND THE QUESTION “HOW?”? 
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Two or three decades ago, the so- 
called heredity-environment question 
was the center of lively controversy. 
Today, on the other hand, many psy- 
chologists look upon it as a dead issue. 
It is now generally conceded that both 
hereditary and environmental factors en- 
ter into all behavior. The reacting or- 
ganism is a product of its genes and its 
past environment, while present envi- 
ronment provides the immediate stimu- 
lus for current behavior. To be sure, it 
can be argued that, although a given 
trait may result from the combined in- 
fluence of hereditary and environmental 
factors, a specific difference in this trait 
between individuals or between groups 
may be traceable to either hereditary or 
environmental factors alone. The de- 
sign of most traditional investigations 
undertaken to identify such factors, 
however, has been such as to yield in- 
conclusive answers. The same set of 
data has frequently led to opposite con- 
clusions in the hands of psychologists 
with different orientations. 

Nor have efforts to determine the 
proportional contribution of hereditary 
and environmental factors to observed 
individual differences in given traits met 
with any greater success. Apart from 
difficulties in controlling conditions, such 
investigations have usually been based 
upon the implicit assumption that he- 
reditary and environmental factors com- 
bine in an additive fashion. Both ge- 
neticists and psychologists have repeat- 
edly demonstrated, however, that a more 
tenable hypothesis is that of interaction 
(15, 22, 28, 40). In other words, the 


1 Address of the President, Division of Gen- 
eral Psychology, American Psychological As- 
sociation, September 4, 1957. 


nature and extent of the influence of 
each type of factor depend upon the 
contribution of the other. Thus the 
proportional contribution of heredity to 
the variance of a given trait, rather 
than being a constant, will vary un- 
der different environmental conditions. 
Similarly, under different hereditary 
conditions, the relative contribution of 
environment will differ. Studies de- 
signed to estimate the proportional con- 
tribution of heredity and environment, 
however, have rarely included measures 
of such interaction. The only possible 
conclusion from such research would 
thus seem to be that both heredity and 
environment contribute to all behavior 
traits and that the extent of their re- 
spective contributions cannot be speci- 
fied for any trait. Small wonder that 
some psychologists regard the heredity- 
environment question as unworthy of 
further consideration! 

But is this really all we can find out 
about the operation of heredity and en- 
vironment in the etiology of behavior? 
Perhaps we have simply been asking the 
wrong questions. The traditional ques- 
tions about heredity and environment 
may be intrinsically unanswerable. Psy- 
chologists began by asking which type 
of factor, hereditary or environmental, 
is responsible for individual differences 
in a given trait. Later, they tried to 
discover how much of the variance was 
attributable to heredity and how much 
to environment. It is the primary con- 
tention of this paper that a more fruit- 
ful approach is to be found in the ques- 
tion “How?” There is still much to be 
learned about the specific modus oper- 
andi of hereditary and environmental 
factors in the development of behavioral 
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differences. And there are several cur- 
rent lines of research which offer promis- 
ing techniques for answering the ques- 
tion “How?” 


VARIETY OF INTERACTION MECHANISMS 


Hereditary factors. If we examine 
some of the specific ways in which he- 
reditary factors may influence behavior, 
we cannot fail but be impressed by their 
wide diversity. At one extreme, we find 
such conditions as phenylpyruvic amen- 
tia and amaurotic idiocy. In these 
cases, certain essential physical pre- 
requisites for normal intellectual de- 
velopment are lacking as a result of 
hereditary metabolic disorders. In our 
present state of knowledge, there is no 
environmental factor which can com- 
pletely counteract this hereditary deficit. 
The individual will be mentally defec- 
tive, regardless of the type of environ- 
mental conditions under which he is 
reared. 

A somewhat different situation is 


illustrated by hereditary deafness, which 
may lead to intellectual retardation 
through interference with normal social 
interaction, language development, and 
schooling. In such a case, however, the 
hereditary handicap can be offset by 
appropriate adaptations of training pro- 


cedures. It has been said, in fact, that 
the degree of intellectual backwardness 
of the deaf is an index of the state of 
development of special instructional fa- 
cilities. As the latter improve, the in- 
tellectual retardation associated with 
deafness is correspondingly reduced. 

A third example is provided by in- 
herited susceptibility to certain physi- 
cal diseases, with consequent protracted 
ill health. If environmental conditions 
are such that illness does in fact de- 
velop, a number of different behavioral 
effects may follow. Intellectually, the 
individual may be handicapped by his 
inability to attend school regularly. On 
the other hand, depending upon age of 
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onset, home conditions, parental status, 
and similar factors, poor health may 
have the effect of concentrating the in- 
dividual’s energies upon intellectual pur- 
suits. The curtailment of participation 
in athletics and social functions may 
serve to strengthen interest in reading 
and other sedentary activities. Con- 
comitant circumstances would also de- 
termine the influence of such illness 
upon personality development. And it 
is well known that the latter effects 
could run the gamut from a deepen- 
ing of human sympathy to psychiatric 
breakdown. 

Finally, heredity may influence be- 
havior through the mechanism of social 
stereotypes. A wide variety of in- 
herited physical characteristics have 
served as the visible cues for identify- 
ing such stereotypes. These cues thus 
lead to behavioral restrictions or op- 
portunities and—at a more subtle level 
—to social attitudes and expectancies. 
The individual's own self concept tends 
gradually to reflect such expectancies. 
All of these influences eventually leave 
their mark upon his abilities and in- 
abilities, his emotional reactions, goals, 
ambitions, and outlook on life. 

The geneticist Dobzhansky illustrates 
this type of mechanism by means of 
a dramatic hypothetical situation. He 
points out that, if there were a culture 
in which the carriers of blood group AB 
were considered aristocrats and those of 
blood group O laborers, then the blood- 
group genes would become important 
hereditary determiners of behavior (12, 
p. 147). Obviously the association be- 
tween blood group and behavior would 
be specific to that culture. But such 
specificity is an essential property of 
the causal mechanism under considera- 
tion. 

More realistic examples are not hard 
to find. The most familiar instances 
occur in connection with constitutional 
types, sex, and race. Sex and skin pig- 
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mentation obviously depend upon he- 
redity. General body build is strongly 
influenced by hereditary components, 
although also susceptible to environ- 
mental modification. That all these 
physical characteristics may exert a 
pronounced effect upon behavior within 
a given culture is well known. It is 
equally apparent, of course, that in dif- 
ferent cultures the behavioral correlates 
of such hereditary physical traits may 
be quite unlike. A specific physical cue 
may be completely unrelated to indi- 
vidual differences in psychological traits 
in one culture, while closely correlated 
with them in another. Or it may be 
associated with totally dissimilar behav- 
ior characteristics in two different cul- 
tures. 

It might be objected that some of the 
illustrations which have been cited do 
not properly exemplify the operation of 
hereditary mechanisms in behavior de- 
velopment, since hereditary factors en- 
ter only indirectly into the behavior in 
question. Closer examination, however, 
shows this distinction to be untenable. 
First it may be noted that the influence 
of heredity upon behavior is always in- 
direct. No psychological trait is ever 
inherited as such. All we can ever say 
directly from behavioral observations is 
that a given trait shows evidence of 
being influenced by certain “inheritable 
unknowns.” This merely defines a prob- 
lem for genetic research; it does not 
provide a causal explanation. Unlike 
the blood groups, which are close to the 
level of primary gene products, psy- 
chological traits are related to genes 
by highly indirect and devious routes. 
Even the mental deficiency associated 
with phenylketonuria is several steps 
removed from the chemically defective 
genes that represent its hereditary ba- 
sis. Moreover, hereditary influences 
cannot be dichotomized into the more 
direct and the less direct. Rather do 
they represent a whole “continuum of 
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indirectness,” along which are found all 
degrees of remoteness of causal links. 
The examples already cited illustrate a 
few of the points on this continuum. 

It should be noted that as we proceed 
along the continuum of indirectness, the 
range of variation of possible outcomes 
of hereditary factors expands rapidly. 
At each step in the causal chain, there 
is fresh opportunity for interaction with 
other hereditary factors as well as with 
environmental factors. And since each 
interaction in turn determines the di- 
rection of subsequent interactions, there 
is an ever-widening network of possible 
outcomes. If we visualize a simple se- 
quential grid with only two alternatives 
at each point, it is obvious that there 
are two possible outcomes in the one- 
stage situation, four outcomes at the 
second stage, eight at the third, and 
so on in geometric progression. The 
actual situation is undoubtedly much 
more complex, since there will usually 
be more than two alternatives at any 
one point. 

In the case of the blood groups, 
the relation to specific genes is so close 
that no other concomitant hereditary or 
environmental conditions can alter the 
outcome. If the organism survives at 
all, it will have the blood group deter- 
mined by its genes. Among psycho- 
logical traits, on the other hand, some 
variation in outcome is always possible 
as a result of concurrent circumstances. 
Even in cases of phenylketonuria, intel- 
lectual development will exhibit some 
relationship with the type of care and 
training available to the individual. 
That behavioral outcomes show pro- 
gressive diversification as we proceed 
along the continuum of indirectness is 
brought out by the other examples 
which were cited. Chronic illness can 
lead to scholarly renown or to intel- 
lectual immaturity; a mesomorphic 
physique can be a contributing factor 
in juvenile delinquency or in the at- 
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tainment of a college presidency! Pub- 
lished data on Sheldon somatotypes 
provide some support for both of the 
latter outcomes. 

Parenthetically, it may be noted that 
geneticists have sometimes used the 
term “norm of reaction” to designate 
the range of variation of possible out- 
comes of gene properties (cf. 13, p. 
161). Thus heredity sets the “norm” 
or limits within which environmental 
differences determine the eventual out- 
come. In the case of some traits, such 
as blood groups or eye color, this norm 
is much narrower than in the case of 
other traits. Owing to the rather differ- 
ent psychological connotations of both 
the words “norm” and “reaction,” how- 
ever, it seems less confusing to speak of 
the “range of variation” in this context. 

A large portion of the continuum of 
hereditary influences which we have de- 
scribed coincides with the domain of 
somatopsychological relations, as defined 
by Barker et al. (6). Under this 
heading, Barker includes “variations in 
physique that affect the psychological 
situation of a person by influencing the 
effectiveness of his body as a tool for 
actions or by serving as a stimulus to 
himself or others” (6, p. 1). Rela- 
tively direct neurological influences on 
behavior, which have been the tradi- 
tional concern of physiological psychol- 
ogy, are excluded from this definition, 
Barker being primarily concerned with 
what he calls the “social psychology of 
physique.” Of the examples cited in 
the present paper, deafness, severe ill- 
ness, and the physical characteristics 
associated with social stereotypes would 
meet the specifications of somatopsycho- 
logical factors. 

The somatic factors to which Barker 
refers, however, are not limited to those 
of hereditary origin. Bodily conditions 
attributable to environmental causes op- 
erate in the same sorts of somatopsy- 
chological relations as those traceable 
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to heredity. In fact, heredity-environ- 
ment distinctions play a minor part in 
Barker’s approach. 

Environmental factors: organic. Turn- 
ing now to an analysis of the role of 
environmental factors in behavior, we 
find the same etiological mechanisms 
which were observed in the case of 
hereditary factors. First, however, we 
must differentiate between two classes 
of environmental influences: (a) those 
producing organic effects which may in 
turn influence behavior and (5) those 
serving as direct stimuli for psychologi- 
cal reactions. The former may be illus- 
trated by food intake or by exposure to 
bacterial infection; the latter, by tribal 
initiation ceremonies or by a course in 
algebra. There are no completely satis- 
factory names by which to designate 
these two classes of influences. In an 


earlier paper by Anastasi and Foley (4), 
the terms “structural” and “functional” 
were employed. 


However, “organic” 
and “behavioral” have the advantage of 
greater familiarity in this context and 
may be less open to misinterpretation. 
Accordingly, thtse terms will be used in 
the present paper. 

Like hereditary factors, environmental 
influences of an organic nature can also 
be ordered along a continuum of indi- 
rectness with regard to their relation to 
behavior. This continuum closely paral- 
lels that of hereditary factors. One end 
is typified by such conditions as mental 
deficiency resulting from cerebral birth 
injury or from prenatal nutritional in- 
adequacies. A more indirect etiological 
mechanism is illustrated by severe mo- 
tor disorder—as in certain cases of cere- 
bral palsy—without accompanying in- 
jury to higher neurological centers. In 
such instances, intellectual retardation 
may occur as an indirect result of the 
motor handicap, through the curtail- 
ment of educational and social activi- 
ties. Obviously this causal mechanism 
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corresponds closely to that of hereditary 
deafness cited earlier in the paper. 

Finally, we may consider an environ- 
mental parallel to the previously dis- 
cussed social stereotypes which were 
mediated by hereditary physical cues. 
Let us suppose that a young woman 
with mousy brown hair becomes trans- 
formed into a dazzling golden blonde 
through environmental techniques cur- 
rently available in our culture. It is 
highly probable that this metamorpho- 
sis will alter, not only the reactions of 
her associates toward her, but also her 
own self concept and subsequent be- 
havior. The effects could range all the 
way from a rise in social poise to a drop 
in clerical accuracy! 

Among the examples of environmen- 
tally determined organic influences 
which have been described, all but the 
first two fit Barker’s definition of soma- 
topsychological factors. With the ex- 


ception of birth injuries and nutritional 


deficiencies, all fall within the social 
psychology of physique. Nevertheless, 
the individual factors exhibit wide di- 
versity in their specific modus operandi 
—a diversity which has important prac- 
tical as well as theoretical implications. 

Environmental factors: behavioral. 
The second major class of environmen- 
tal factors—the behavioral as contrasted 
to the organic—are by definition direct 
influences. The immediate effect of 
such environmental factors is always a 
behavioral change. To be sure, some 
of the initial behavioral effects may 
themselves indirectly affect the indi- 
vidual’s later behavior. But this rela- 
tionship can perhaps be best concep- 
tualized in terms of breadth and per- 
manence of effects. Thus it could be 
said that we are now dealing, not with a 
continuum of indirectness, as in the case 
of hereditary and organic-environmental 
factors, but rather with a continuum of 
breadth. 

Social class membership may serve 
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as an illustration of a relatively broad, 
pervasive, and enduring environmental 
factor. Its influence upon behavior de- 
velopment may operate through many 
channels. Thus social level may deter- 
mine the range and nature of intellec- 
tual stimulation provided by home and 
community through books, music, art, 
play activities, and the like. Even more 
far-reaching may be the effects upon in- 
terests and motivation, as illustrated by 
the desire to perform abstract intellec- 
tual tasks, to ‘surpass others in competi- 
tive situations, to succeed in school, or 
to gain social approval. Emotional and 
social traits may likewise be influenced 
by the nature of interpersonal relations 
characterizing homes at different socio- 
economic levels. Somewhat more re- 
stricted in scope than social class, al- 
though still exerting a relatively broad 
influence, is amount of formal schooling 
which the individual is able to obtain. 

A factor which may be wide or narrow 
in its effects, depending upon concomit- 
ant circumstances, is language handi- 
cap. Thus the bilingualism of an adult 
who moves to a foreign country with 
inadequate mastery of the new language 
represents a relatively limited handicap 
which can be readily overcome in most 
cases. At most, the difficulty is one of 
communication. On the other hand, 
some kinds of bilingualism in childhood 
may exert a retarding influence upon 
intellectual development and may un- 
der certain conditions affect personality 
development adversely (2, 5, 10). A 
common pattern in the homes of immi- 
grants is that the child speaks one lan- 
guage at home and another in school, 
so that his knowledge of each language 
is limited to certain types of situations. 
Inadequate facility with the language of 
the school interferes with the acquisition 
of basic concepts, intellectual skills, and 
information. The frustration engendered 
by scholastic difficulties may in turn 
lead to discouragement and general dis- 
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like of school. Such reactions can be 
found, for example, among a number of 
Puerto Rican children in New York City 
schools (3). In the case of certain 
groups, moreover, the child’s foreign 
language background may be perceived 
by himself and his associates as a sym- 
bol of minority group status and may 
thereby augment any emotional malad- 
justment arising from such status (34). 

A highly restricted environmental in- 
fluence is to be found in the oppor- 
tunity to acquire specific items of in- 
formation occurring in a particular in- 
telligence test. The fact that such 
opportunities may vary with culture, 
social class, or individual experiential 
background is at the basis of the test 
user's concern with the problem of 
coaching and with “culture-free” or 
“culture-fair” tests (cf. 1, 2). If the 
advantage or disadvantage which such 
experiential differences confer upon cer- 
tain individuals is strictly confined to 
performance on the given test, it will 
obviously reduce the validity of the test 
and should be eliminated. 

In this connection, however, it is es- 
sential to know the breadth of the en- 
vironmental influence in question. A 
fallacy inherent in many attempts to 
develop culture-fair tests is that the 
breadth of cultural differentials is not 
taken into account. Failure to consider 
breadth of effect likewise characterizes 
certain discussions of coaching. If, in 
coaching a student for a college admis- 
sion test, we can improve his knowledge 
of verbal concepts and his reading com- 
prehension, he will be better equipped 
to succeed in college courses. His per- 
formance level will thus be raised, not 
only on the test, but also on the cri- 
terion which the test is intended to pre- 
dict. To try to devise a test which is 
not susceptible to such coaching would 
merely reduce the effectiveness of the 
test. Similarly, efforts to rule out cul- 
tural differentials from test items so as 
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to make them equally “fair” to subjects 
in different social classes or in different 
cultures may merely limit the useful- 
ness of the test, since the same cultural 
differentials may operate within the 
broader area of behavior which the test 
is designed to sample. 


METHODOLOGICAL APPROACHES 


The examples considered so far should 
suffice to highlight the wide variety of 
ways in which hereditary and environ- 
mental factors may interact in the 
course of behavior development. There 
is clearly a need for identifying ex- 
plicitly the etiological mechanism where- 
by any given hereditary or environmen- 
tal condition ultimately leads to a be- 
havioral characteristic—in other words, 
the “how” of heredity and environment. 
Accordingly, we may now take a quick 
look at some promising methodological 
approaches to the question “how.” 

Within the past decade, an increasing 
number of studies have been designed 
to trace the connection between specific 
factors in the hereditary backgrounds or 
in the reactional biographies of indi- 
viduals and their observed behavioral 
characteristics. There has been a defi- 
nite shift away from the predominantly 
descriptive and correlational approach 
of the earlier decades toward more de- 
liberate attempts to verify explanatory 
hypotheses. Similarly, the cataloguing 
of group differences in psychological 
traits has been giving way gradually to 
research on changes in group charac- 
teristics following altered conditions. 

Among recent methodological devel- 
opments, we have chosen seven as be- 
ing particularly relevant to the analysis 
of etiological mechanisms. The first rep- 
resents an extension of selective breed- 
ing investigations to permit the identifi- 
cation of specific hereditary conditions 
underlying the observed behavioral dif- 
ferences. When early selective breeding 
investigations such as those of Tryon 














(36) on rats indicated that “maze learn- 
ing ability” was inherited, we were still 
a long way from knowing what was 
actually being transmitted by the genes. 
It was obviously not “maze learning 
ability” as such. Twenty—or even ten 
—years ago, some psychologists would 
have suggested that it was probably 
general intelligence. And a few might 
even have drawn a parallel with the in- 
heritance of human intelligence. 

But today investigators have been 
asking: Just what makes one group of 
rats learn mazes more quickly than the 
other? Is it differences in motivation, 
emotionality, speed of running, general 
activity level? If so, are these behav- 
ioral characteristics in turn dependent 
upon group differences in glandular de- 
velopment, body weight, brain size, bio- 
chemical factors, or some other organic 
conditions? A number of recent and 
ongoing investigations indicate that at- 
tempts are being made to trace, at least 
part of the way, the steps whereby cer- 
tain chemical properties of the genes 
may ultimately lead to specified behav- 
ior characteristics. 

An example of such a study is pro- 
vided by Searle’s (31) follow-up of 
Tryon’s research. Working with the 
strains of maze-bright and maze-dull 
rats developed by Tryon, Searle demon- 
strated that the two strains differed in 
a number of emotional and motivational 
factors, rather than in ability. Thus 
the strain differences were traced one 
step further, although many links still 
remain to be found between maze learn- 
ing and genes. A promising methodo- 
logical development within the same 
general area is to be found in the recent 
research of Hirsch and Tryon (18). 
Utilizing a specially devised technique 
for measuring individual differences in 
behavior among lower organisms, these 
investigators launched a series of stud- 
ies on selective breeding for behavioral 
characteristics in the fruit fly, Dro- 


HEREDITY, ENVIRONMENT, AND THE QuEsTION “How?” 


203 


sophila. Such research can capitalize 
on the mass of available genetic knowl- 
edge regarding the morphology of Dro- 
sophila, as well as on other advantages 
of using such an organism in genetic 
studies. 

Further evidence of current interest 
in the specific hereditary factors which 
influence behavior is to be found in an 
extensive research program in progress 
at the Jackson Memorial Laboratory, 
under the direction of Scott and Fuller 
(30). In general, the project is con- 
cerned with the behavioral character- 
istics of various breeds and cross-breeds 
of dogs. Analyses of some of the data 
gathered to date again suggest that 
“differences in performance are pro- 
duced by differences in emotional, mo- 
tivational, and peripheral processes, and 
that genetically caused differences in 
central processes may be either slight 
or non-existent” (29, p. 225). In other 
parts of the same project, breed dif- 
ferences in physiological characteristics, 
which may in turn be related to be- 
havioral differences, have been estab- 
lished. 

A second line of attack is the explo- 
ration of possible relationships between 
behavioral characteristics and physio- 
logical variables which may in turn be 
traceable to hereditary factors. Re- 
search on EEG, autonomic balance, 
metabolic processes, and biochemical 
factors illustrates this approach. A 
lucid demonstration of the process of 
tracing a psychological condition to ge- 
netic factors is provided by the identifi- 
cation and subsequent investigation of 
phenylpyruvic amentia. In this case, 
the causal chain from defective gene, 
through metabolic disorder and conse- 
quent cerebral malfunctioning, to feeble- 
mindedness and other overt symptoms 
can be described step by step (cf. 32; 
33, pp. 389-391). Also relevant are 
the recent researches on neurological 
and biochemical correlates of schizo- 
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phrenia (9). Owing to inadequate 
methodological controls, however, most 
of the findings of the latter studies must 
be regarded as tentative (19). 

Prenatal environmental factors pro- 
vide a third avenue of fruitful investi- 
gation. Especially noteworthy is the 
recent work of Pasamanick and his as- 
sociates (27), which demonstrated a 
tie-up between socioeconomic level, com- 
plications of pregnancy and parturition, 
and psychological disorders of the off- 
spring. In a series of studies on large 
samples of whites and Negroes in Balti- 
more, these investigators showed that 
various prenatal and paranatal disor- 
ders are significantly related to the oc- 
currence of mental defect and psychi- 
atric disorders in the child. An impor- 
tant source of such irregularities in the 
process of childbearing and birth is to 
be found in deficiencies of maternal diet 
and in other conditions associated with 
low socioeconomic status. An analysis 
of the data did in fact reveal a much 
higher frequency of all such medical 
complications in lower than in higher 
socioeconomic levels, and a higher fre- 
quency among Negroes than among 
whites. 

Direct evidence of the influence of 
prenatal nutritional factors upon subse- 
quent intellectual development is to be 
found in a recent, well controlled ex- 
periment by Harrell et al. (16). The 
subjects were pregnant women in low- 
income groups, whose normal diets were 
generally quite deficient. A dietary sup- 
plement was administered to some of 
these women during pregnancy and lac- 
tation, while an equated control group 
received placebos. When tested at the 
ages of three and four years, the off- 
spring of the experimental group ob- 
tained a significantly higher mean IQ 
than did the offspring of the controls. 

Mention should also be made of ani- 
mal experiments on the effects of such 
factors as prenatal radiation and neo- 
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natal asphyxia upon cerebral anomalies 
as well as upon subsequent behavior de- 
velopment. These experimental studies 
merge imperceptibly into the fourth ap- 
proach to be considered, namely, the in- 
vestigation of the influence of early ex- 
perience upon the eventual behavioral 
characteristics of animals. Research in 
this area has been accumulating at a 
rapid rate. In 1954, Beach and Jaynes 
(8) surveyed this literature for the Psy- 
chological Bulletin, listing over 130 ref- 
erences. Several new studies have ap- 
peared since that date (e.g., 14, 21, 24, 
25, 35). The variety of factors covered 
ranges from the type and quantity of 
available food to the extent of contact 
with human culture. A large number 
of experiments have been concerned 
with various forms of sensory depriva- 
tion and with diminished opportunities 
for motor exercise. Effects have been 
observed in many kinds of animals and 
in almost all aspects of behavior, includ- 
ing perceptual responses, motor activity, 
learning, emotionality, and social reac- 
tions. 

In their review, Beach and Jaynes 
pointed out that research in this area has 
been stimulated by at least four distinct 
theoretical interests. Some studies were 
motivated by the traditional concern 
with the relative contribution of matu- 
ration and learning to behavior develop- 
ment. Others were designed in an ef- 
fort to test certain psychoanalytic theo- 
ries regarding infantile experiences, as 
illustrated by studies which limited the 
feeding responses of young animals. A 
third relevant influence is to be found 
in the work of the European biologist 
Lorenz (23) on early social stimulation 
of birds, and in particular on the spe- 
cial type of learning for which the term 
“imprinting” has been coined. A rela- 
tively large number of recent studies 
have centered around Hebb’s (17) the- 
ory regarding the importance of early 
perceptual experiences upon subsequent 
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performance in learning situations. All 
this research represents a rapidly grow- 
ing and promising attack on the modus 
operandi of specific environmental fac- 
tors. 

The human counterpart of these ani- 
mal studies may be found in the com- 
parative investigation of child-rearing 
practices in different cultures and sub- 
cultures. This represents the fifth ap- 
proach in our list. An outstanding ex- 
ample of such a study is that by Whit- 
ing and Child (38), published in 1953. 
Utilizing data on 75 primitive societies 
from the Cross-Cultural Files of the 
Yale Institute of Human Relations, 
these investigators set out to test a 
number of hypotheses regarding the re- 
lationships between child-rearing prac- 
tices and personality development. This 
analysis was followed up by field ob- 
servations in five cultures, the results of 
which have not yet been reported (c’. 
37). 

Within our own culture, similar sur- 
veys have been concerned with the di- 
verse psychological environments pro- 
vided by different social classes (11). 
Of particular interest are the study by 
Williams and Scott (39) on the as- 
sociation between socioeconomic level, 
permissiveness, and motor development 
among Negro children, and the explora- 
tory research by Milner (26) on the 
relationship between reading readiness 
in first-grade children and patterns of 
parent-child interaction. Milner found 
that upon school entrance the lower- 
class child seems to lack chiefly two ad- 
vantages enjoyed by the middle-class 
child. The first is described as “a warm 
positive family atmosphere or adult-re- 
lationship pattern which is more and 
more being recognized as a motivational 
prerequisite of any kind of adult-con- 
trolled learning.” The lower-class chil- 
dren in Milner’s study perceived adults 
as predominantly hostile. The second 
advantage is an extensive opportunity 
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to interact verbally with adults in the 
family. The latter point is illustrated 
by parental attitudes toward mealtime 
conversation, lower-class parents tend- 
ing to inhibit and discourage such con- 
versation, while middle-class parents en- 
courage it. 

Most traditional studies on child-rear- 
ing practices have been designed in 
terms of a psychoanalytic orientation. 
There is need for more data pertaining 
to other types of hypotheses. Findings 
such as those of Milner on opportunities 
for verbalization and the resulting ef- 
fects upon reading readiness represent 
a step in this direction. Another pos- 
sible source of future data is the ap- 
plication of the intensive observational 
techniques of psychological ecology de- 
veloped by Barker and Wright (7) to 
widely diverse socioeconomic groups. 

A sixth major approach involves re- 
search on the previously cited soma- 
topsychological relationships (6). To 
date, little direct information is avail- 
able on the precise operation of this 
class of factors in psychological devel- 
opment. The multiplicity of ways in 
which physical traits—whether heredi- 
tary or environmental in origin—may 
influence behavior thus offers a rela- 
tively unexplored field for future study. 

The seventh and final approach to 
be considered represents an adaptation 
of traditional twin studies. From the 
standpoint of the question “How?” 
there is need for closer coordination be- 
tween the usual data on twin resem- 
blance and observations of the family 
interactions of twins. Available data 
already suggest, for example, that close- 
ness of contact and extent of environ- 
mental similarity are greater in the case 
of monozygotic than in the case of di- 
zygotic twins (cf. 2). Information on 
the social reactions of twins toward 
each other and the specialization of 
roles is likewise of interest (2). Espe- 
cially useful would be longitudinal stud- 

















206 


ies of twins, beginning in early infancy 
and following the subjects through 
school age. The operation of differen- 
tial environmental pressures, the devel- 
opment of specialized roles, and other 
environmental influences could thus be 
more clearly identified and correlated 
with intellectual and personality changes 
in the growing twins. 

Parenthetically, I should like to add 
a remark about the traditional applica- 
tions of the twin method, in which per- 
sons in different degrees of hereditary 
and environmental relationships to each 
other are simply compared for behav- 
ioral similarity. In these studies, at- 
tention has been focused principally 
upon the amount of resemblance of 
monozygotic as contrasted to dizygotic 
twins. Yet such a comparison is par- 
ticularly difficult to interpret because of 
the many subtle differences in the en- 
vironmental situations of the two types 
of twins. A more fruitful comparison 
would seem to be that between dizygotic 
twins and siblings, for whom the he- 
reditary similarity is known to be the 
same. In Kallmann’s monumental re- 
search on psychiatric disorders among 
twins (20), for example, one of the most 
convincing bits of evidence for the op- 
eration of hereditary factors in schizo- 
phrenia is the fact that the degrees of 
concordance for dizygotic twins and for 
siblings were practically identical. In 
contrast, it will be recalled that in in- 
telligence test scores dizygotic twins re- 
semble each other much more closely 
than do siblings—a finding which re- 
veals the influence of environmental fac- 
tors in intellectual development. 


SUMMARY 


The heredity-environment problem is 
still very much alive. Its viability is 


assured by the gradual replacement of 
the questions, “Which one?” and “How 
much?” by the more basic and appro- 
priate question, “How?” 


Hereditary in- 
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fluences—as well as environmental fac- 
tors of an organic nature—vary along a 
“continuum of indirectness.” The more 
indirect their connection with behavior, 
the wider will be the range of variation 
of possible outcomes. One extreme of 
the continuum of indirectness may be 
illustrated by brain damage leading to 
mental deficiency; the other extreme, by 
physical characteristics associated with 
social stereotypes. Examples of factors 
falling at intermediate points include 
deafness, physical diseases, and motor 
disorders. Those environmental factors 
which act directly upon behavior can be 
ordered along a continuum of breadth 
or permanence of effect, as exemplified 
by social class membership, amount of 
formal schooling, language handicap, 
and familiarity with specific test items. 

Several current lines of research offer 
promising techniques for exploring the 
modus operandi of hereditary and envi- 
ronmental factors. Outstanding among 
them are investigations of: (a) heredi- 
tary conditions which underlie behav- 
ioral differences between selectively bred 
groups of animals; (0) relations be- 
tween physiological variables and indi- 
vidual differences in behavior, especially 
in the case of pathological deviations; 
(c) role of prenatal physiological factors 
in behavior development; (d) influence 
of early experience upon eventual be- 
havioral characteristics; (e) cultural 
differences in child-rearing practices in 
relation to intellectual and emotional 
development; (f) mechanisms of soma- 
topsychological relationships; and (g) 
psychological development of twins from 
infancy to maturity, together with ob- 
servations of their social environment. 
Such approaches are extremely varied 
with regard to subjects employed, na- 
ture of psychological functions studied, 
and specific experimental procedures fol- 
lowed. But it is just such heterogeneity 
of methodology that is demanded by 
the wide diversity of ways in which he- 
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reditary and environmental factors in- 
teract in behavior development. 
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Beginning with a rather incidental 
observation reported by May (14) in 
1948, a number of investigators (6, 22, 
25) have shown that in a conditioning 
set-up, involving a noxious uncondi- 
tioned stimulus, the incidence of inter- 
trial (spontaneous) responses will in- 
crease markedly if such responses are 
allowed to function instrumentally, i.e., 
to “avoid,” in the sense of postponing, 
the next “trial.” This effect is not ob- 
tainable if the unconditioned stimulus is 
preceded by a warning signal (27), but 
does occur regardless of whether the 
instrumental intertrial response is the 
same as or different from the response 
produced by the unconditioned stimulus 
(13; cf. 18). The increased incidence 
of intertrial responses under the condi- 
tions described is in contrast to the 
previously reported observation that in- 
tertrial responses occur progressively 
less frequently in a conditioning situa- 
tion wherein they cannot function in- 
strumentally, i.e., where the intertrial 
interval is fixed or at least variable only 
by the experimenter (7, 19). 

The discovery that an instrumental 
procedure of the kind described will 
increase the incidence of intertrial re- 
sponses followed logically enough from 
the finding of Hunter (11), Brogden, 
Lipman, and Culler (3), and others that 
a response made to a warning (condi- 
tioned) stimulus is more readily and 
more reliably fixated if, when such a re- 
sponse occurs, the unconditioned stimu- 
lus is omitted, rather than being paired 
willynilly (as in “classical” condition- 

1QOn leave of absence, 1956-57, by benefit 


of a Ford Fellowship, from the Department 
of Psychology, American University of Beirut. 


ing) with the conditioned stimulus. And 
from this and related work it was soon 
evident that so-called avoidance learn- 
ing is not a matter of simple (stimulus- 
substitution) conditioning, but that it 
involves two distinct stages or “levels”: 
when a signal is paired with some pain- 
ful stimulus, what an organism learns 
first is to be afraid; and then it learns, 
perhaps by a quite different process, 
what to do about its fear (16,17). Yet, 
with respect to the learning of instru- 
mental (avoidance) intertrial responses, 
there has been a tendency to adhere to 
the older one-stage stimulus-substitu- 
tion type of interpretation. The pur- 
pose of the present paper is to show, on 
both empirical and logical grounds, how 
much more satisfactory is a two-factor 
type of explanation. 


PREVIOUS RESEARCH AND THEORY 


In the winter of 1946, Hart West- 
brook made what seems to have been 
the first attempt to set up an experi- 
mental situation in which intertrial re- 
sponses might function instrumentally 
in the manner already indicated. How- 
ever, since negative results were ob- 
tained (probably because training was 
not continued long enough; see later 
section on Interval-Avoidance Learn- 
ing), the details of this study need not 
be described here. Then, as already 
indicated, two years later May reported 
the occurrence of intertrial ‘response 
learning in a situation designed to in- 
vestigate quite a different problem. This 
decidedly incidental, but striking, find- 
ing was reported thus: 


In the above experiments the swinging door 
over the barrier [separating the two sides of 
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a shuttle box] was never locked. The rats 
could cross and recross at will. An indefinite 
number of crossings was possible during the 
training and testing trials. Sixty to 90 sec. 
were allowed between the last spontaneous 
crossing and the next training or testing trial. 
Thus, on the training trials an animal could 
avoid shock for a considerable time by cross- 
ing and recrossing at intervals of less than 60 
sec. Ten of the 12 experimental animals 
(Group A—door closed) and 11 of the con- 
trols learned this form of adaptive behavior 
and used it fairly regularly both on the train- 
ing and on the test trials. Very few of the 
animals in B-Group (door open), on the other 
hand, learned it (14, p. 71). 


In the summary of the study cited, 
May does not allude to this finding and 
elsewhere in his paper makes no attempt 
to interpret or analyze it. However, in 
a study by Bugelski and Coyer (6), 
briefly reported in 1950, the problem 
was attacked quite directly and with a 
theoretical issue in mind, implied by the 
title, “Temporal Conditioning vs. Anx- 
iety Reduction in Avoidance Learning.” 
A few years later, Bugelski described 
the procedure employed thus: 


Every 15 seconds a shock was applied in suf- 
ficient strength to force a rat over a hurdle. 
With repeated trials, the animals came to give 
almost perfect performance, jumping the hurdle 
in the general interval of 10 to 15 seconds, 
thus avoiding shock (5, p. 68). 


In accompanying graphs Bugelski 
shows that, although rats learned to 
“shuttle” more readily under the condi- 
tions indicated (i.e., when the shock- 
shock interval was 15 seconds), another 
group of animals also solved the prob- 
lem with a shock-shock interval of 60 
seconds. Say Bugelski and Coyer of 
their study: 


The results support Hull’s account of avoid- 
ance behavior (Principles) as antedating re- 
sponses developing from escape reactions. 
Anxiety reduction and expectancy explana- 
tions need not be invoked for the responses 
studied (6, p. 265). 


By this, these writers apparently mean 
that shuttling, when it occurs between 
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trials, is just the conditioned (antedat- 
ing) form of the escape reaction elicited 
by the shock which now, however, is 
produced by the experimental situation 
alone (without shock). Such an inter- 
pretation is, in a sense, classically Pav- 
lovian and, as these writers point out, 
does not require that fear be posited as 
an intervening variable or that fear re- 
duction be a form of reinforcement. 
Reasons for doubting the adequacy of 
this view will be given presently. 

Then, in 1953, Sidman, using a dif- 
ferent type of response and training 
procedure, also reported positive results: 


White rats were the experimental organisms, 
with lever pressing selected as the avoidance 
response. Shocks of a fixed 0.2-sec. duration 
were given to the animal through a grid floor 
at regular [commonly 5-sec.] intervals unless 
the lever was depressed. Each lever depres- 
sion reset the timer controlling the shock, thus 
delaying its appearance [by 20 seconds or so]. 
. . . Only the initial downward press on the 
lever reset the timer; holding the lever down 
had no effect upon the occurrence of the shock. 

With no other contingencies between avoid- 
ance behavior and exteroceptive stimulation in- 
volved, approximately 50 animals have been 
successfully conditioned. . . . A striking char- 
acteristic of the initial curves is the abrupt- 
ness with which the rate increases from its 
initial near-zero level. . . . With continued 
training the rate remains relatively stable not 
only within but also between sessions. Rates 
as high as 17 responses/min have been main- 
tained by some animals during sessions total- 
ing over 24 hours, with variations no greater 
than 0.1 responses/min appearing between the 
average rate for each session (25, pp. 157-158). 


Sidman was inclined to interpret this 
striking effect as follows: 


The behavior generated by this procedure 
can be explained by a model which holds that 
avoidance responding increases in rate at the 
expense of other behavior that is depressed by 
shock. An equivalent statement, in reinforce- 
ment terms, is that the avoidance response is 
strengthened when it terminates incompatible 
behavior that has been paired with shock (8, 
23) [ie., interval responses, other than the 
“correct” one, which do not postpone shock, 
get punished and thus tend to arouse fear]. 
Several lines of evidence indicate that the 
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avoidance rate is not simply some form of 
temporal conditioning in which the responses 
are triggered-off by the passage of a time in- 
terval (25, p. 158). 


Because of the brevity of the paper 
just cited, the precise nature of Sid- 
man’s theoretical position was not en- 
tirely clear, so we turn to the paper by 
Schoenfeld which Sidman cites. This 
paper is long and intricate and cannot 
be summarized here; but the following 
sentences, from the summary, are per- 
haps the most relevant ones for present 
purposes: 


Following several objections raised against 
[the] anxiety-reduction hypothesis, an alterna- 
tive explanation of avoidance is offered in 
which the response is conceived to be always 
a stimulus-terminating one. . . . It is argued 
that the avoidance response terminates [aver- 
sive] stimulus compounds in which propriocep- 
tive and tactile stimuli are important com- 
ponents. . . . If this formulation is correct, 
avoidance conditioning proves to be a form of 
escape training, and its avoidance function is 
incidental to its stimulus-termination function 
(23, p. 97). 


That the Schoenfeld position is, how- 
ever, open to varying interpretations is 
indicated by the following comment by 
Kamin: 


The usual assumption has been that spon- 
taneous responses result from conditioning to 
generalized apparatus cues (cf. 9, p. 76; 23, 
p. 89) (12, p. 71)? 


While the theoretical picture was thus 
somewhat confused as of 1954, it is fair 
to say that the prevailing tendency was 
to explain the reinforcement of instru- 


2Solomon and Brush (33, p. 266) allude to 
a study of intertrial avoidance learning re- 
ported (as a personal communication) by 
Milner in 1955. In only one respect (which 
will be mentioned later) does this study ex- 
tend what had been learned from earlier ex- 
periments. Solomon and Brush also give a 
comprehensive review of research on avoid- 
ance learning where the CS-US interval is 
systematically varied (with different subjects). 
However, this line of work, while germane, is 
not directly relevant to the present discussion 
and will not be specifically cited. 
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mental intertrial responses as not involv- 
ing the reduction of a temporally condi- 
tioned fear (or anxiety), and perhaps 
not involving the reduction of fear at all. 


EXPERIMENTAL DIsPpROOF OF CLASSICAL 
CONDITIONING INTERPRETATION 


The fear-reduction hypothesis has 
been so successful in accounting for 
avoidance learning in general, and in 
resolving paradoxes generated by other 
theories, that there is strong presump- 
tion that it can also explain the learn- 
ing of instrumental intertrial responses. 
But at once there is at least a super- 
ficial difficulty. In “ordinary” avoid- 
ance learning, i.e., where there is a defi- 
nite and specific warning signal, it is 
simple and reasonable to suppose (a) 
that fear becomes conditioned to this 
signal and () that any response which 
turns the signal off (and averts the un- 
conditioned stimulus) will be reinforced 
by fear reduction. But, in the case of 
instrumental intertrial responses, there 
is no such specific warning signal, so 
that the fear-reduction hypothesis, if ap- 
plicable at all, must apply in some rather 
subtle and special way. In later sec- 
tions, this inference will be considered 
in detail; but first a simple experiment 
will be reported which clearly shows the 
inadequacy of the classical conditioning 
type of explanation. 

Kamin has already been quoted as 
saying: “The usual assumption has been 
that spontaneous responses result from 
conditioning to generalized apparatus 
cues.” If the classical conception of 
conditioning were valid, then, every time 
a subject experienced electric shock in 
a given experimental situation and made 
a specific response thereto, there would 
be an increased tendency for the sub- 
ject to make that response to the stimu- 
lus compound which immediately pre- 
cedes the shock. Since it is the experi- 
mental situation as a whole, rather than 
a more specific stimulus, which here 
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precedes shock and thus becomes, in 
effect, the conditioned stimulus, the sub- 
ject should soon start making the re- 
sponse in question to that situation 
“spontaneously,” i.e., without a specific 
cue or signal. 

On the other hand, if one accepts the 
two-factor version of avoidance learn- 
ing, one must assume that, in order for 
intertrial responses to become more fre- 
quent, they must be reinforced by the 
occurrence of fear reduction. In the 
one case, mere contiguity of stimulation 
(situation + shock) would provide the 
conditions necessary for the increased 
occurrence of intertrial responses; 
whereas, in the other case, something 
more would be required: these responses 
would have to be in some sense func- 
tional, instrumental, problem solving. 
In the experiment to be described 
shortly, these two hypotheses are put to 
the test; but, before describing that ex- 
periment, it will be necessary to say a 
word about apparatus and procedure. 

In the instrumental intertrial learn- 
ing experiments already alluded to, 
Westbrook and Schaefer used a wheel- 
turning response, May and Bugelski and 
Coyer used shuttling, and Sidman used 
bar pressing. In casting about for an 
avoidance response that might be more 
“natural” for rats than either bar press- 
ing or wheel turning and also free of the 
conflict that necessarily exists (at least 
in the beginning) in a shuttling set-up, 
one of the present writers (J. D. K.) 
tried out a revolving cage similar to that 
employed in the experiment of Brogden, 
Lipman, and Culler (3) and found it 
highly satisfactory. With the response 
now merely a short run (which revolves 
the cage but otherwise gets the subject 
nowhere), rats show excellent interval- 
avoidance learning; and with a few re- 
finements the apparatus has proved it- 
self to be reliable, flexible, and well 
adapted to the investigation of a wide 
range of related problems. The refine- 
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ments are these: (a) a simple, silent 
“lock” which allows the cage to move 
easily in one direction but not in the 
other; (5) a device which quietly resets 
an interval timer whenever the cage is 
turned (in the “correct” direction) by 
a minimum of two or three inches; and 
(c) articulation of the revolving cage 
with a cumulative graphic recorder (de- 
scribed elsewhere, 15). Thus, most of 
the procedure and all the recording are 
made automatic and put beyond the 
influence of experimenter produced vari- 
ables. 

Figure 1 shows the performance typi- 
cal of an albino rat (a 140-day-old 
male) in the apparatus here described, 
under the following conditions. For 
one hour on three successive days this 
rat had been put into the revolvable 
cage, with no shock, and allowed to 
habituate. What little spontaneous run- 
ning occurred in the beginning soon dis- 
appeared, so that on the third day the 
rat, in the course of an hour, turned the 
cage less than two revolutions, as can 
be seen by the virtually horizontal “base- 
line” in the lower left-hand corner of 
Fig. 1. Then, after the three daily ses- 
sions of habituation, the rat was sub- 
jected to daily training sessions, again 
of one-hour duration each, under the 
following conditions: a light electric 
shock from the grillwork which consti- 
tutes the “floor” of the revolvable cage 
came on every 20 seconds and remained 
on until the rat ran and revolved the 
cage far enough (roughly two inches) 
to turn off the shock, unless such a run 
occurred between shocks, in which event 
the timing mechanism reset and started 
another 20-second cycle, without shock. 
As will be seen in Fig. 1, this subject, 
during the first experimental session, re- 
ceived 37 shocks (indicated by the la- 
teral spurs on the cumulative response 
line). Since a total of 180 shocks (3 to 


the minute for 60 minutes) would have 
been received during the experimental 
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EXPERIMENTAL SESSIONS 
Fic. 1. Cumulative records of an albino rat’s performance in a revolvable cage when between- 
trial (“spontaneous”) runs postpone the next shock by 20 seconds. An activity baseline, repre- 
senting the amount of running by the subject during the third of three daily habituation sessions 
of one hour each, under conditions of no shock, is shown in Graph 3-H. Graphs 1-T, 2-T, and 


3-T represent the amount of running during hourly sessions on each of three successive “train- 
Shocks were applied only when the subject failed, for 20 seconds, to revolve the 
Shocks are indicated by the lateral spurs 


ing” days. 
cage by the prescribed amount (two or three inches). 
on the underside of the graphs. 
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session if no interval running had oc- 
curred, it is clear that 143 shocks were 
thus avoided. On the second day the 
number of shocks not avoided dropped 
to 15, and on the third day this number 
dropped still further to 11. Here, mani- 
festly, is highly successful avoidance 
learning, in a situation that is basically 
simple both as regards apparatus and 
procedure. 

But such results, taken alone, are 
somewhat ambiguous. There are several 
ways in which they might be interpreted. 
In the absence of any other evidence to 
the contrary, one might infer, with Bu- 
gelski and Coyer, that the between- 
shock runs are simply conditioned (an- 
tedating) versions of the reaction of the 
subject to the shock itself. Lacking a 
more specific conditioned stimulus, such 
as a light or buzzer to herald the ap- 
proach of shock, the total situation 
might here be thought of as the con- 
ditioned stimulus, which, being con- 
tinuously present between shock pres- 
entations, elicits numerous, somewhat 
irregular occurrences of the running 
response. 

Or, consider another possibility. 
There is an obvious resemblance be- 
tween the curves shown in Fig. 1 and 
those commonly reported, by Skinner 
(31) and others, where the subject, mo- 
tivated by hunger, gets reinforced only 
intermittently (with food) for pushing 
a bar but continues to make this re- 
sponse at a fairly rapid and regular rate 
between reinforcements. Unlike the 
hunger, which is continuous, the shock 
in this situation is discontinuous; but 
fear is presumably present much of the 
time while the subject is in the revolva- 
ble cage. So one might think of the 
response to shock-plus-fear as generaliz- 
ing to fear alone and thus occurring 
many times on the basis of the sheer 
habit strength (or reflex reserve) gen- 
erated solely by the reinforcement pro- 
vided by escape from the shock-plus- 
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fear on the occasions when shock was 
presented. Here, as in the Bugelski- 
Coyer type of explanation, there would 
be no need to posit any reinforcement 
of the interval responses themselves, 
through fear reduction; they could be 
thought of instead as pure extinction 
responses. 

Fortunately, a simple variation of the 
procedure which leads to results such as 
those shown in Fig. 1 provides a fairly 
crucial means of determining whether 
the interval responses which are here 
exhibited are reinforced only on those 
occasions when shock is received or are 
secondarily reinforced, by fear reduc- 
tion, each time they occur. Typical re- 
sults for such a procedure are shown in 
Fig. 2. Here an animal, which may be 
called C (control) and which had had 
three daily sessions of habituation just 
as had E (the animal used to obtain the 
curves shown in Fig. 1), was subjected 
to daily training sessions under the fol- 
lowing conditions: it was put into the 
revolvable cage and given shocks at ex- 
actly the same points in time as those at 
which E had failed to “avoid” and had 
been shocked. In other words, since 
Animal E had received 37 shocks on the 
first day of training, Animal C received 
the same number of shocks distributed 
in time in exactly the same way; the 
same sort of procedure was repeated on 
the two remaining days. As a result 
of receiving the first 10 or 12 shocks, 
Animal C actually did somewhat more 
running than did Animal E, but soon 
thereafter the performance of the two 
animals became strikingly different. 
During the course of Session 1-T (first 
training day), Animal C, although re- 
ceiving the same number (and approxi- 
mate duration) of shocks, turned the 
cage only 94 revolutions as opposed to 
198 revolutions for Animal E; and, in 
Session 3-T, the discrepancy had be- 
come much wider still: only 10 revolu- 
tions for Animal C as opposed to 249 
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EXPERIMENTAL SESSIONS 


Fic. 2. 


Cumulative records of an albino rat’s performance in a revolvable cage when a run 


(of two or three inches) turns off the shock (received from the cage floor) but, when occurring 
spontaneously, does not postpone the next shock. The four graphs are labeled as in Fig. 1. 
The shocks, equal in intensity and number, are distributed in time just as were those received 


by the subject whose performance is shown in Fig. 1. 


This type of procedure provides a control 


for certain hypotheses, cited in the text, which have been advanced to account for the type 
of interval-response avoidance learning shown in Fig. 1. 


revolutions for Animal E. As Curve 
3-T shows, when the shock came on, 
Animal C would still run far enough to 
turn the shock off (or perhaps a little 
further), but there was virtually no in- 
terval running. This animal had learned 
that the shocks were not avoidable, that 
they came regardless of what was done 
in the intervals between shocks; so, 
there being no gain from such behavior, 
the animal, sensibly enough, abandoned 
it. This outcome, substantiated by re- 
sults obtained with three other pairs of 
animals, supports the hypothesis that 
interval responses, if they are to be 
perpetuated, must be reinforced in some 
manner other than by the reinforcement 
provided by the mere pairing of situa- 
tion and shock. 


INTERVAL AVOIDANCE LEARNING Fur- 
THER ANALYZED AND INTERPRETED 


From the empirical results just re- 
ported, it is evident that intertrial avoid- 


ance learning cannot be adequately ex- 
plained in terms of classical condition- 
ing. Some more intricate interpretation 
is obviously called for; the one which 
most readily suggests itself involves 
the now well authenticated concept of 
stimulus trace (1, 4). In an experi- 
mental situation of the kind under dis- 
cussion, each response, whether forced 
(by shock) or spontaneous, presumably 
sets up a neural reverberation (“immedi- 
ate memory”) which decays to zero (or 
at least to asymptote) in roughly 30 
seconds. If, therefore, electric shock 
be delivered to the subject whenever, 
let us say, the 20-second point on the 
stimulus trace is reached, one would ex- 
pect that fear would become conditioned 
to this point on the trace and, in gen- 
eralizing forward, would motivate the 
subject to make the response in question 
during the intertrial (20-second) inter- 
val, thus averting the impending shock. 
And, since each new occurrence of the 
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response would, so to say, “reset” the 
trace, there would be a reduction in fear 
and hence a reinforcement of the re- 
sponse itself (cf. 7, 20).* 

Reasonable as it is a priori, this hy- 
pothesis did not, at first, seem to have 
much empirical justification. The West- 
brook study was undertaken with this 
hypothesis explicitly in mind; and when 
preliminary work failed to reveal the ex- 
pected effect, ie., showed no tendency 
for the intertrial response to become 
temporally conditioned, the study was 
discontinued and the hypothesis ques- 
tioned. May, as already noted, was ap- 
parently the first to obtain good inter- 
trial avoidance learning, but he made 
no attempt to account for it theoreti- 
cally. Bugelski and Coyer, to be sure, 
reported (6) that, with repeated trials, 
the animals came to give almost perfect 
performance, jumping the hurdle in the 
general interval of 10 to 15 seconds, 
thus avoiding shock; but these writers 
advanced the now untenable antedating 


hypothesis (of Pavlov and Hull). Sid- 
man, finding no evidence of temporal 
conditioning in his own early researches 
(see Fig. 3), adopted an explanation, 
derived from Schoenfeld, to the effect 
that the “correct” interval response is 
learned because other intertrial responses 


3 Imagine an experimental situation in which 
a relatively loud tone came on immediately 
after the subject had responded in a specified 
way and then gradually softened—or a patch 
of light which, upon occurrence of the re- 
sponse, became suddenly large and then grad- 
ually diminished in size. Suppose also that 
the subject never received shock when the 
stimulus was large (or loud) but was shocked 
when it diminished beyond a certain point. It 
would be in no way surprising if the subject, 
under these conditions, learned to use such a 
stimulus as the basis of successful avoidance 
behavior. Our conjecture is that, in interval- 
avoidance learning of the kind under discus- 
sion, something comparable is involved, the 
principal difference being that the changing 
cue, instead of being an external stimulus, is 
a stimulus trace prodi ced by the last preced- 
ing instrumental (overt) response. 
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get punished (in the sense of failing to 
postpone shock) and therefore, relative 
to the “correct” response, become in- 
hibited. Moreover, since the occurrence 
of the correct response stops or pre- 
vents the occurrence of other (fear- 
producing) responses, this response will 
have a tendency to reduce fear and thus 
be reinforced—an interpretation which 
makes no use whatever of the stimulus- 
trace concept. (Sidman did not, to be 
sure, use precisely the above terminol- 
ogy; but, in light of considerations to 
be advanced in the next section, this 
paraphrasing seems legitimate.) 
Schaefer, in the recent study already 
cited, likewise failed to find evidence of 
temporal conditioning in his subjects— 
in fact, their intertrial responses tended 
decidedly to pile up (as in Fig. 3) to- 
ward the beginning of the intertrial 
interval rather than toward the end 
thereof. And on the basis of this find- 
ing, Schaefer advanced an explanation 
of intertrial avoidance learning which, 
while in effect something like the one 
put forward by Sidman, is at least dif- 
fently phrased. According to Schaefer’s 
conjecture, what the subject in an ex- 
periment of this kind does is to dis- 
criminate between two conditions: situ- 
ation with correct response being made 
and situation with this response not 
being made. And, since the rat is never 
shocked while thus responding and is 
shocked when responding differently or 
not at all, then the -rat, feeling safe 
while responding correctly and fearful 
at all other times, is differentially rein- 
forced on each occurrence of the correct 
response. In other words, by making a 
particular response, the animal, so to 
say, converts the situation from a dan- 
gerous one to a safe one and is thereby 
rewarded. Here the dilemma mentioned 
earlier—that there is no explicit danger 
signal in a situation of this kind whose 
termination can provide the specific oc- 
casion for fear reduction and reinforce- 
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Fic. 3. Distribution of intertrial responses 
made by a rat during Session 3 of an experi- 
ment reported by Sidman (26). The abscissa 
represents seconds of time elapsing between 
the preceding response and the next one. 
When the subject delayed by as much as 20 
seconds between responses, it then received an 
electric shock of brief (0.2 sec.) duration. As 
will be noted, most responses occurred shortly 
after the preceding one, rather than piling up 
in the true “danger zone,” i.e., during the lat- 
ter part of the intertrial period (see text). 


ment of the avoidance response—is re- 
solved by making the situation as a 
whole, without correct responding, the 
“danger signal,” which is then “turned 
off’ whenever and while the animal 
makes the requisite response. 

This is an eminently satisfactory 
theory (and is stated more simply and 
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plausibly than Sidman’s somewhat 
equivalent one); but it cannot, alone, 
account for all the now known facts. 
In 1954, in a paper entitled “The Tem- 
poral Distribution of Avoidance Re- 
sponses,” Sidman reported that “after 
continued training” there is a tendency 
for a “well developed time-discrimina- 
tion” to appear (26, p. 401). In two 
recent papers, Boren and Sidman (2, 29) 
show that, if training is continued long 
enough (to or beyond approximately 
100 hours), then interval-avoidance re- 
sponses do indeed take on the expected 
temporal distribution, i.e., they tend to 
pile up near the remote end of the inter- 
trial interval, rather than toward the 
beginning thereof (see Solomon and 
Brush’s account of Milner’s work, 33, 
p. 266). Hence, even though the Sid- 
man-Schaefer type of explanation be 
valid for the results first obtained 
in interval-response avoidance learning, 
such an explanation must obviously be 
supplemented if one is also to account 
for the opposite type of temporal dis- 
tribution of responses obtained after 
more extended training. One possi- 
bility, therefore, is that, in experimenta- 
tion of the kind under discussion, the 
first discrimination that is developed is 
a comparatively gross one, i.e, between 
experimental situations with and with- 
out the correct response being in prog- 
ress, and that a second stage of dis- 
crimination later develops in which the 
subject discovers that situation-without- 
correct-responding is dangerous only 
after an interval of time has elapsed. 
There is, however, also the possibility 
that the stimulus-trace type of explana- 
tion can account for all the facts. Let 
us assume that fear is most strongly 
conditioned to a point on the stimulus 
trace 20 seconds removed from the event 
that sets up the trace (i.e., the preced- 
ing correct response, either forced or 
spontaneous). Let us further assume, 
as we reasonably may, that fear (at least 
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in the beginning) generalizes far for- 
ward on the trace. If, then, as already 
posited, occurrence of the correct re- 
sponse during the intertrial period sets 
up a new trace and reduces fear, one 
might expect this response to occur just 
as soon as the subject starts being 
afraid. In other words, there is nothing 
in the nature of the situation which 
would require that the subject wait until 
maximally afraid before reacting defen- 
sively. Therefore, the temporal dis- 
tribution of the incidence of the correct 
overt response would not be expected 
necessarily to provide a faithful picture 
of the temporal pattern of fear intensity. 
Occurrence of the correct overt inter- 
trial response might very well pile up, 
not where the fear would be maximally 
intense, but rather where it is first ex- 
perienced. 

Eventually, of course, one would ex- 
pect discrimination to develop and to 
offset this original overgeneralization, so 
that fear would be first experienced 
during the intertrial period, not far for- 
ward (due to generalization along the 
trace), but instead toward the latter 
part of the period where the objective 
danger really is. 

In a study published in 1956 under 
the title “Time Discrimination and Be- 
havioral Interaction in a Free Operant 
Situation,” Sidman has shown that it is 
a simple matter to get rats to respond 
with considerable accuracy to a 20- 
second temporal interval when they are 
thirst-motivated and water-reinforced 
for pressing one bar, at the appropriate 
interval, after having pressed (on cue) 
another bar. Thus, as Sidman ob- 
serves: 


The procedure is an operant analogue of 
Pavlovian trace conditioning, with the quali- 
fication that the “trace” is initiated not by an 
exteroceptive stimulus but by the organism’s 
own behavior. . . . The response probability 
increases with the passage of time, not because 
of the dissipation of inhibition [as Hull (10) 
is credited with holding], but rather because 
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the situation becomes more like that pre- 
vailing at the time of reinforcement (28, p. 
469). 


At first it may appear strange that a 
free-operant type of behavior (bar press- 
ing) can be temporally conditioned so 
readily when the subjects are thirst-mo- 
tivated; whereas, as already noted, in 
an avoidance-learning situation the same 
free operant for a long time occurs much 
too soon, i.e., long before the time when 
the unconditioned stimulus will actually 
occur. But there is an instructive differ- 
ence: in the situation when the subjects 
are thirst-motivated, responses that oc- 
cur too soon go unrewarded, i.e., do not 
produce water and are thus extinguished 
relatively rapidly; whereas, in the situa- 
tion when the subjects are fear-moti- 
vated, each and every occurrence of the 
“correct” response during the intertrial 
period, no matter how early in that 
period, produces a decrement in fear 
(however slight) and thus tends to be 
self-reinforcing, rather than self-extin- 
guishing. Here, as in many other situa- 
tions, we see the importance of differ- 
entiating between primary and second- 
ary drive and drive reduction. 


Two DIFFERENT CONCEPTIONS OF 
“AVERSIVE” STIMULATION 


In earlier sections of this paper, we 
have seen how numerous and determined 
the attempts have been to account for 
various forms of avoidance behavior 
without recourse to the concept of fear 
as an intervening variable. There is, 
of course, every justification in science 
for trying to push the principle of parsi- 
mony as far as possible. But it must be 
remembered that sometimes a given type 
of parsimony proves to be quite im- 
possible and has to be replaced by a 
more complex conceptual scheme. It is 
the present writers’ belief that a simple 
one-step (S—-R) conception of behavior 
is inadequate and that, at the very least, 
we must posit a two-step S-r:s-R model, 
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where S and R, as in the simpler S-R 
scheme, are “objective” stimulation and 
response, but where, in addition, r and 
s are intervening variables, of which fear 
(as both a reaction and as a drive) is 
a prime example. 

The attempt of Sheffield to interpret 
the Brogden-Lipman-Culler experiment 
(previously cited) solely in terms of the 
contiguity (stimulus-substitution) prin- 
ciple and thus to eliminate the necessity 
of positing fear as an intervening varia- 
ble illustrates the difficulties which such 
efforts commonly encounter. Says Shef- 
field: 


The findings for avoidable shock offer no 
evidence for a strengthening effect of avoid- 
ance and follow the conventional result in 
Pavlovian experiments. Reinforcement by 
shock strengthened conditioned running; omis- 
sion of shock led to extinction (24, pp. 174- 
175). 


The point can be illustrated by refer- 
ring back to Fig. 1. There it will be 


seen that, in Sessions 2-T and 3-T, in 


each of the two longest runs of interval- 
responding, there is, indeed, a tendency 
for this behavior to extinguish (as evi- 
denced by the flattening of the cumu- 
lative response curve) and to have to 
be eventually reinforced by shock pres- 
entation. Here, it might seem, is clear 
evidence that shock avoidance is not 
reinforcing and that the response of 
running is indeed dependent upon the 
occasional occurrence of the shock. The 
difficulty, of course, is the one that al- 
ways arises in such a situation as a re- 
sult of speaking as if only one ‘response 
were involved. The present writers as- 
sume that shock is, of course, occasion- 
ally necessary to reinforce the fear re- 
sponse, but that the running response is 
reinforced by fear reduction not by 
shock. It is our assumption that, when- 
ever enough fear is present to motivate 
intertrial running, there will be an ensu- 
ing experience of fear reduction, with the 
result that this behavior is always rein- 
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forced. It is, rather, the underlying 
fear that extinguishes and has to be 
eventually reinforced by presentation of 
shock. Our assumption is that, if fear 
could be maintained at an appropriate 
intensity by some other means, the in- 
tertrial running would never extinguish. 
This assumption is supported by the 
recently reported finding of Sidman, 
Herrnstein, and Conrad that: 


The effect of occasional free shocks, delivered 
independently of the monkeys’ behavior, was 
to ‘increase the rate of avoidance responding. 
... The effect of the free shock persisted over 
as long as 300 experimental hours during 
which time no other shocks were delivered 
(30, p. 557). 


The authors’ interpretation of this ef- 
fect, in terms of Skinner’s theory of 
superstitious behavior, seems to us con- 
trived and nonparsimonious. 

In an unpublished paper, Sidman, 
Herrnstein, and Boren try—but, we be- 
lieve, unsuccessfully—to develop a con- 
ceptual scheme which will account for 
all forms of avoidance learning without 
recourse to the concept of fear as a 
motivating and reinforcing agent. In 
essence, the position is pristine behavior- 
ism and stems directly, as the authors 
acknowledge, from the empirical re- 
searches and methodological stance of 
B. F. Skinner. In his 1953 book, Sci- 
ence and Human Behavior, Skinner 
states his position in this connection 
with stark simplicity. The chapter on 
“Emotion” begins with this sentence: 
“The ‘emotions’ are excellent examples 
of the fictional causes to which we com- 
monly attribute behavior” (32, p. 160). 
Then the familiar, but highly debatable, 
assertion by William James is approv- 
ingly quoted to the effect 


that we feel sorry because we cry, angry be- 
cause we strike, afraid because we tremble, 
and not that we cry, strike, or tremble because 
we are sorry, angry, or fearful, as the case 
may be (32, p. 160). 
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And, later in the chapter, there is a 
section specifically headed “Emotions 
Are Not Causes.” 

Rarely are theoretical issues (in psy- 
chology) so beautifully explicit and 
“clean.” The present writers, along 
with many others, take the position that 
the emotion of fear is causal; that a one- 
step (simple S-R) psychology will not 
work (17, 21); that, as a minimum, two 
causal steps are necessary to account 
for behavior; and that fear, in the case 
of so-called avoidance behavior, is an 
essential intermediate cause or determi- 
nant. 

On the other hand, Skinner and those 
of like persuasion hold that this is not 
at all the case and that a “chaining in- 
terpretation,” not greatly different from 
the earlier Watsonian notion of reflex 
chaining, is “capable of handling most 
of the data and of extension to other 
findings, while at the same time main- 
taining a degree of parsimony and in- 
ternal consistency that we were unable 
to achieve by means of the alternative 
considerations” (unpublished manuscript 
by Sidman-Herrnstein-Boren). 

This is not the time or place to enter 
into a detailed critique of the chaining 
interpretation, but one particularly ques- 
tionable aspect thereof may be per- 
tinently not¢d, namely, the tendency to 
speak of stimuli which have been paired 
with punishment as being aversive. 
This, we believe, involves a logically in- 
defensible attempt to avoid the more 
likely possibility that such stimuli, after 
conditioning, become aversive because, 
and only in the sense that, they now 
elicit fear and that it is this latter state, 
and not the eliciting stimulus, which 
provides the aversive, motivating ele- 
ment in the situation (cf. Solomon and 
Brush, 33, p. 245 ff.). 


SUMMARY 


It is now empirically established (a) 
that responses which occur between 
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trials in a noxious conditioning situation 
occur progressively less frequently if 
trials occur ineluctably, i.e., at fixed in- 
tervals, and (6) that between-trial re- 
sponses occur progressively more fre- 
quently when the next trial is delayed 
each time such a response is made. 
There is, of course, no mystery as to 
why intertrial (spontaneous) responses 
disappear in the former situation: they 
accomplish nothing and, if they happen 
to occur late in the intertrial interval, the 
ensuing noxious stimulus acts, in effect, 
as a punishment. 

But there has been considerably less 
agreement as to how it is, under the 
second set of conditions mentioned, that 
intertrial responses are reinforced and 
perpetuated. A number of writers have 
taken the position that here the inter- 
trial response (made to the experimental 
situation as a whole, rather than to a 
specific danger signal) is just the ante- 
dating (classically conditioned) form 
of the response made to the noxious, un- 
conditioned stimulus on the trial. In 
the present paper various considerations 
are cited which indicate that this ex- 
planation is too simple and that a more 
complex explanation, involving the in- 
tervening advents of fear arousal and 
fear reduction, is necessary and gen- 
erally sufficient. 

Intertrial avoidance learning, as a 
laboratory phenomenon, deserves con- 
tinued study, both because of its in- 
trinsic theoretical interest and because 
of its rather patent relationship to com- 
pulsive (neurotically repetitive) be- 
havior and to the psychology of work. 
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The study of cumulated jnd scales 
began with Fechner; Fechner’s law is 
such a scale. Psychophysicists have 
been deriving such scales and com- 
paring them with scales derived in 
other ways, notably by fractionation, 
ever since, and a lot of controversy 
has resulted. The controversy is 
particularly hot at present because 
Stevens and Galanter (16) and Stevens 
(14) have assembled a lot of data 
which indicate that cumulated jnd 
scales do not agree with magnitude 
scales derived by other methods for 
intensity continua such as loudness, 


brightness, and pain. 


Unfortunately, Fechner’s proce- 
dure for cumulating jnds, which has 
been widely defended but not widely 
applied since his day, rests on an as- 
sumption which is inconsistent with 
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one of his definitions. This means 
that cumulated jnd scales developed 
by his procedure are incorrect, and 
so comparisons between them and 
other kinds of scales are meaningless. 

This paper begins by showing that 
Fechner’s method contains internal 
contradictions for all but a few special 
cases, and that it cannot be rescued 
by minor changes. It goes on to 
derive a new and mathematically ap- 
propriate method for cumulating jnd’s. 
This method turns out to be the sim- 
plest possible one: you can best cumu- 
late jnd’s simply by adding them on 
top of each other, like a stack of 
plates. Unfortunately, the detailed 
mathematical equivalent of this very 
simple operation is often fairly com- 
plicated. A simple but sometimes 
tedious graphic procedure, however, 
is readily available—and indeed has 
customarily been used by most sci- 
entists when developing cumulated 
jnd scales. This paper ends by dis- 
cussing practical applications of this 
method, the relation it bears to scal- 
ing methods based on the law of 
comparative judgment, and the cur- 
rent controversy about scaling meth- 
ods in psychophysics. 

The model of a sensation scale. The 
psychophysical model of a sensation 
scale is a mathematical model; a 
sensation scale is an_ intervening 
variable. The rules by which sensa- 
tion scales should be constructed are 
to some degree arbitrary, limited by 
logic, convenience, intuition, and best 
fit to data. 
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DERIVATION OF SUBJECTIVE SCALES 


The model of a sensation scale goes 
as follows. Corresponding to many 
of the major subjective dimensions of 
change of sensory experience, there 
are primary physical dimensions of 
change (e.g., pitch and frequency, 
loudness and amplitude, etc.). Once 
parametric conditions for significant 
variables have been specified, we 
assume that a single-valued, mono- 
tonic, everywhere — differentiable 
(smooth) function exists that relates 
the subjective dimension to its cor- 
responding physical dimension. 
From here on, we shall use the words 
“dimension” and ‘‘continuum”’ inter- 
changeably; we shall usually talk 
about a stimulus continuum and its 
corresponding sensory continuum. 

That is the model, and it is very 
easy to state. The big difficulty 
comes when we try to decide how to 
fit data to it. All methods for doing 
this must introduce definitions and as- 


sumptions beyond those listed in the 


previous paragraph. These _ differ 
from one method to another. 

The oldest sensory scaling method, 
Fechner’s, is based upon a further 
condition that says that any jnd ona 
given sensory continuum is subject- 
ively equivalent to any other jnd on 
that continuum. Whether this added 
condition is to be interpreted as 
merely a definition of the scale under 
consideration or as an assumption is 
a matter of opinion. Textbooks usu- 
ally say Fechner ‘‘assumed”’ that all 
jnd’s for a sensory dimension are 
equal to one another (1). It is not 
easy to know what he had in mind, but 
judging by his writings he probably 
did view it as an assumption having 
implications beyond scale construc- 
tion. Since it is not directly observ- 
able and since its indirect conse- 
quences are highly debatable, others 
since Fechner have suggested that it 
might better be viewed as a definition. 
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It is our view that this is the more 
sensible position; it certainly is the 
one for us to take in this paper since 
our point is a logical one, not a sub- 
stantive one. If equality of jnd’s is 
taken as a definition, then it cannot be 
proved or disproved by any kind of 
empirical evidence. An experiment, 
for instance, that showed that a tone 
20 jnd’s loud is not half as loud (ac- 
cording to fractionation judgments) 
as a tone 40 jnd’s loud would not 
have any relevance to what we may 
call Fechner’s definition ; it would only 
show that the kind of sensation scale 
implied by his definition does not 
agree with the kind implied by the 
definitions used in fractionation ex- 
periments. The issue, then, becomes 
what are the different scales useful for, 
and what is their relationship one to 
another. The latter part of this 
paper touches briefly on this problem. 

The main purpose of the paper is to 
explore the consequences of Fechner’s 
definition. Therefore, we must be 
certain of its meaning. It clearly 
does not mean, for instance, that all 
jnd’'s for loudness contain the same 
number of physical units. It is only 
on the sensory continuum, not on the 
stimulus continuum, that jnd’s are 
defined as equal to one another. 
Furthermore, this definition holds 
only if all stimulus properties except 
those on the primary stimulus con- 
tinuum remain constant. So there 
is no reason, for instance, to expect 
that the subjective size of a loudness 
jnd at 1,000 cycles per second (cps) 
should be the same as that of a loud- 
ness jnd at 4,000 cps. Of course, it 
would be pleasant if they were equal 
in size, but the model does not require 
it. 

From here on we will talk about 
two kinds of jnd’s. A sensation jnd 
is the magnitude of a jnd as measured 
in the units of the appropriate sensa- 





224 


tion continuum. By definition, all 
sensation jnd’s for a given sensation 
continuum are equal to one another, 
given unchanged values of all stimu- 
lus properties except those on the pri- 
mary stimulus continuum. A stimu- 
lus jnd is the magnitude of the 
change on the primary stimulus con- 
tinuum, measured in appropriate 
physical units, which is just sufficient 
to produce a change of one sensation 
jnd upward at that point. (A dis- 
cussion of the essentially statistical 
nature of jnd’s appears later in this 
paper.) In general, stimulus jnd’s 
will have different sizes at different 
points on the primary stimulus con- 
tinuum. The rest of this paper will 
not be intelligible unless you keep the 
distinction between these two kinds 
of jnd’s in mind. 

We have assumed that jnd’s are 
measured upward on the stimulus 
continuum. They could also be meas- 
ured downward, and the possibility 
exists that the two measurements 
might not agree. In fact, they are 
certain not to agree if the distance 
spanned is more than two jnd’s, and 
if the size of the jnd at the end where 
measurement starts is used as the 
‘unit of measurement, since this means 
that the size of the measurement unit 
will be different depending on direc- 
tion of measurement. However, such 
discrepancies might exist in the meas- 
urement of a single jnd; this, if it 
happened, would mean that jnd’s are 
not suitable units of measurement 
unless direction is specified. We have 
therefore confined ourselves to upward 
jnd’s. 

Now we can say exactly what this 
paper is about. Given a function 
(obtained from experiment, theory, 
or both) relating stimulus to sensation 
jnd’s for all points of the primary 
stimulus continuum, what may we in- 
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fer about the sensory scale implied by 
that jnd function ? 

Fechner’s derivation of Fechner’s 
law. On October 22, 1850, Fechner 
(2) thought up the first (incorrect) 
answer to the question which ended 
the previous paragraph. Let us call 
any function that gives the size of a 
stimulus jnd at each point of the 
stimulus continuum a Weber function 
(corresponding to ‘‘a function relating 
stimulus to sensation jnd’s’’ of the 
previous section), and any one-to-one 
function based on cumulated jnd’s 
which relates the stimulus continuum 
to a sensory scale a Fechner function 
(corresponding to ‘‘a sensation scale’’ 
of the previous section). These def- 
initions do not restrict our attention 
to those two special functions which 
have come to be known in psycho- 
physics as Weber's law and Fechner’s 
law! Fechner believed that the 
Fechner function corresponding to 
any Weber function could be expressed 
as the solution (integral) of a first 
order linear differential equation in- 
volving that Weber function. He 
applied this procedure to Weber's 
aw, which asserts that for a given 
stimulus continuum the size of the 
stimulus jnd, Ax, divided by the 
value of the stimulus at that point, x, 
is a constant (Ax/x =k). Let us 
examine his argument. 

If Weber's law is true, then, since 
all sensation jnd’s are equal by defini- 
tion, there is a constant A such that 

Au A 1] 


Ax % 


where Au denotes the size of the sen- 
sation jnd. The heart of Fechner’s 
solution to his and our basic problem 
was to “rewrite’’ Equation 1 as the 
differential equation 

du A 


dx x 


[2] 
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How did Fechner make this step from 
differences (deltas) to differentials? 
He used what he called a ‘‘mathema- 
tical auxiliary principle,”’ the essence 
of which is that what is true for differ- 
ences as small as jnd’s ought also to be 
true for all smaller differences and so 
true in the limit as they approach 
zero (differentials). If this argument 
were acceptable (which it is not), the 
rest would be simple. Equation 2, 
when integrated, yields the familiar 
logarithmic relationship between sen- 
sation and stimulus which is known as 
Fechner’s law. 

Fechner thought that his general 
procedure ought to be applicable to 
any Weber function, not just to 
Weber's law. It is not. Except for 
a few special cases like Weber’s law, 
the definition of sensation jnd’s as 
equal and the ‘‘mathematical auxili- 
ary principle’ are mutually contra- 
dictory. For example, consider the 
Weber function Ax/x? = k. Then, 
following Fechner’s procedure, we 
should write: 

Au A A 
lla and so a 
Integrating, we get 
u=B- 4 
x 

Let us now check to see whether this 
new Fechner function satisfies the 
definition which says that sensation 
jnd’s are equal to one another. If we 
are at point x on the stimulus contin- 
uum, a stimulus jnd, according to the 
Weber function used in this example, 
is kx*. The sensation increment cor- 
responding to this change, the sensa- 
tion jnd at this point, is therefore 
given by: 
u(x + kx?) — u(x) 
<a 

(x + kx?) x 
Ak 
~ itkex 


=B 
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which is clearly not a constant for any 
value of the constant A except zero. 

This, although only one example, is 
typical in the sense that almost any 
example you could think of would 
show the same discrepancy. Only for 
a very few Weber functions—some 
pathological ones, Weber’s law, and 
its generalization Ax = kx + c—does 
the “mathematical auxiliary princi- 
ple’ yield a Fechner function with 
equal jnd’s. We will not take space 
to prove this formally, but a formal 
proof is available. 

The functional equation solution. 
We have shown that Fechner’s pro- 
cedure involves a self-contradiction. 
We shall show later that it leads to 
wrong results in all important cases 
except Weber’s law. Obviously the 
“mathematical auxiliary principle” 
is wrong and must go. 

How, then, should we cumulate 
jnd’s? The simplest, most obvious 
procedure (which has very often been 
used exactly because it is simplest and 
most obvious) is simply to add them 
up one ata time. If the first jnd on 
a primary stimulus continuum is 5 
stimulus units, then two points on our 
cumulated jnd scale should be 0, 0 and 
1, 5, where the first number is the 
scale value on the y axis and the 
second number is the corresponding 
stimulus value on the x axis. If we 
then find that the size of the stimulus 
jnd at 5 on the stimulus continuum is 
is 8, then the third point is 2, 13. If 
we find that the size of the stimulus 
jnd at 13 is 10, then the fourth point 
is 3, 23, and so on. 

Fechner and some of his more 
modern imitators went way out of 
their way to avoid this simple and 
sensible procedure; in retrospect it is 
hard to decide why they did so. At 
any rate, the next two sections of this 
paper will develop a formal mathema- 
tical solution to Fechner’s mathema- 
tical problem—a solution which turns 
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out to be the mathematical equivalent 
of the simple graphical or arithmetic 
technique discussed in the previous 
paragraph. The mathematical prob- 
lem centers about how to fill in the 
curve between the discrete points 
arrived at by the graphical method. 

What mathematical tools can we 
use to replace Fechner’s ‘“‘mathemati- 
cal auxiliary principle’? Equation 1, 
and the corresponding ones based on 
other Weber functions, can be solved 
directly without any mathematical 
auxiliary principles or other further 
assumptions. They are examples of 
what mathematicians call functional 
equations. The papers on which 
most of our discussion is based (5, 6) 
were published in the 1880s, twenty 
years after Fechner first published 
his work. 

The kind of functional equation 
implied by the definition of equality of 
sensation jnd’s is soluble for a very 
wide class of Weber functions. Un- 


fortunately, there is an infinity of 
inherently different solutions to each 


of these equations. However, further 
consideration of what we mean by a 
sensation scale will lead us to proper- 
ties which we usually take for granted 
and which are enough to narrow the 
solutions down to just one interval 
scale, unique except for its zero point 
and unit of measurement. It is inter- 
esting that in the case of the linear 
generalization of Weber’s law, and in 
that case only, the functional-equation 
solution is the same as that obtained 
by Fechner’s auxiliary principle; for 
all other Weber functions the two 
solutions are different. 

First, we will state the general 
mathematical problem and its solu- 
tion. Let x, x > 0, denote a typical 
value of the stimulus continuum, and 
let u denote the (unknown) Fechner 
function. Let g be the (given) Weber 
function; i.e., a stimulus magnitude 
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y, y> x, is detected as larger (in a 
statistical sense) than x ify > x +g 
(x), whereas it is not discriminated as 
different from x if x << y<x + g(x). 
We write x + g(x) = f(x). By def- 
inition, a sensory jnd at the sensation 
u(x) is given by the increment? 


uLf(x)] — u(x) 


’ 


(In the usual ‘“‘delta’”’ notation, g(x) 
= Ax and uff(x)] — u(x) = Au.) 
The condition that sensation jnd’s be 
equal simply means that all sensation 
jnd’s are a constant, which we may 
take to be 1 for convenience, since an 
arbitrary change of unit does not 
matter. Thus, we have our major 
mathematical problem: 


Find those real-valued differ- 
entiable functions u, defined for 
all x > 0, such that u[f(x)] — 
u(x) = 1, for allx> 0. 


Note that we have said those func- 
tions, not that function, for there may 
be more than one such function. 
This uniqueness question has not 
traditionally been raised, for so long 
as the problem was formulated in 
terms of linear differential equations, 
the uniqueness theorems of that 
branch of mathematics insured only 
one solution. In the realm of func- 
tional equations, we have no such 
assurances. 

It is very lucky that the functional 
equation which has arisen in this prob- 
lem is one of the more famous in the 


? Throughout this paper we shall have to 
use functions of functions. In general, if v 
and w are two real-valued functions of a real 
variable x, v[w(x)] denotes the number ob- 
tained by calculating y = w(x) and then find- 
ing v(y). Clearly, the order of writing v and 
w is material, for v[w(x)] does not generally 
equal w[v(x)]. Consider, for example, v(x) 
= ax, where a + 1, and w(x) = x*. Then, 
v[w(x)] = v(x?) = ax’, whereas w[v(x)] = 
w(ax) = a*x?, 
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literature ; it is called Abel’s equation.* 
The principal results we shall need 
concerning this equation were pre- 
sented by Koenigs (5, 6) in 1884 and 
1885.4 First, we will present his 
uniqueness results, which illustrate the 
method of attack and lead up to the 
general solution. Suppose that wo(x) 
is a solution to Abel's equation, and 
suppose p(x) is an arbitrary periodic 
function with period 1—in other words, 
any function satisfying 


p(x + 1) = p(x) 


K sin 2x is periodic with period 1, 
and so is an example of a function 
p(x). It is easly to show that the 
function u,(x) = uo(x) + pluo(x) ] is 
also a solution to Abel’s equation: 


uoL f(x) ]+p {wolf (x) J} 
1+uo(x)+p[1+0(x) ] 
= 1+4u0(x)+p[uo(x) ] 

1 +4, (x 


Furthermore, it can be shown that if 
u and u* are two solutions to Abel's 
equation, then there exists a periodic 
function p with period 1 such that 


= u*(x) + plu*(x)] 


Thus, if we have any solution uv» to 
our problem and if we choose p to be a 
differentiable periodic function with 
period 1, then uw, = uo + p(uo) is also 
differentiable and solves the problem. 

In the case of Weber's law, we 
have f(x) = kx, k > 1, and the dif- 

log x 


ferentiable function uo(x) = ey is 


uy f(x) ] = 


u(x) 


8’ Sometimes this equation is spoken of as 
the Abel-Schroder equation, but more often 
Abel's name is attached to this equation and 
Schroder’s name to the equation v[f(x)] = 
cu(x), which arises from Abel’s equation 
through the substitution v = c*. 

4We are indebted to Richard Bellman of 
the RAND Corportion for directing us to the 
literature on the Abel equation. 


227 


easily shown to satisfy the condition 
of equal sensation jnd’s. Therefore, 
log Fs log x 
log Rt »(; log k 
is also a solution if p is differentiable 
and periodic with period 1. 

There is an infinity of such func- 
tions p, and so in infinity of different 
solutions to the problem for any 
Weber function, including Weber's 
law. This, of course, is quite unsatis- 
factory ; later on we will show that one 
of the properties that we usually at- 
tribute to jnd’s, and which as yet we 
have not used, enables us to insure a 
unique solution. However, first it 
will be useful to present Koenigs’s re- 
sults on the existence of solutions to 
Abel's equation. 

The existence of solutions to Abel's 
equation. In psychophysical prob- 
lems, there is always a_ threshold 
R> 0, such that g(x) is not observ- 
able in the range O< x < R. Thus, 
it is only a matter of convenience what 
we assume about the behavior of g 
near 0; we shall suppose that 


g(0)=0 and 0 


<g'(0)< = 


It is known also 


ms dg 
where g’(x) = =. 


dx 


from experimental work that g is 
never 0 and that on the whole it will 
increase with x, except for limited 
ranges of some stimuli, where it may 
decrease slowly. With little or no 
loss of generality, we may suppose it 
never decreases so rapidly as to have a 
slope less than —1. In other words, 
we also assume: 


g(x)>0 and g’(x)>-—1 for x>0 


From these assumptions, it follows 
that f(x) = x + g(x)has these prop- 


erties: 
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f if strictly monotonic in x, i.e., if 
x <y, then f(x) < f(y); 0 is the 
only fixed _ ry f (x is a fixed 
point if f(x) = x); and 1 < f’(0) 
< @, 


The strict monotonicity of f implies 


that there exists an inverse function 
f-, i.e., a function such that 


f?Uf(x)] = x = ff (x)] 
It is easy to show that: 


f- is strictly monotonic increas- 
ing, x is a fixed point of f~ if and 
only if x = 0,0 < f-"(0) <1. 


Observe that if we know a solution v 


to the equation 


vf f(x) ] = 1 + v(x) [3] 
then u = — v is a solution to 
u( f(x) ] = 1 + u(x) [4] 


So it will suffice to deal with f—. If, 
in addition to the three properties 
mentioned, f~' is analytic, i.e., if there 
exist constants a; such that 


x 
f(x) = > ax, 

1=0 
then Koenigs has shown that a differ- 
entiable solution exists to Abel's 
equation. In applications, analytic- 
ity is no real restriction. For simplic- 
ity of notation, let us denote f-' by h; 
then Koenigs’ theorem (which is not 
easy to prove) may be expressed as 
follows: Let h“™ denote the n* iterate 
of h (i.e., k™(x) is the result of n 
successive applications of h beginning 
at the point x), and let 


h™) (x) 
x) = lim == 
o(s) = i TOF 
then ¢ exists and is differentiable, and 


log ¢(x) 


vol®) = ioe WO) 
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is a solution to Abel’s Equation 3. 





Therefore, since h’(0) = 1/f’(0), 
wale) _ log ¢(x) 
Jog f’ (0) 


is a solution to Equation 4 and so to 
our problem. 

The difficult part of the proof is to 
show that the limit exists. Assuming 
that it does, it is easy to show that 
uo(x) isasolution. Since h[f(x)] =x, 


uoL f(x) ] 
=) sar 
~ [tx 5 
x ae = [08 5°00) 


ho 1) (x) 


2 [h’(0) J 








log f’(0) + log lim =O) 


log f’ ©) 
1 + uo(x) 


The evaluation of the above limit 
for @ is rarely a simple task. Fur- 
thermore, the conditions under which 
it has been shown to exist and to pro- 
vide a solution to Abel’s equation are 
only sufficient conditions—there are 
other circumstances in which solu- 
tions exist. For example, the func- 
tion f(x) = ax’, b ¥ 1, fails to satisfy 
1 <f'(0) < ~, yet by direct verifica- 
tion one can show that 





log log [ax] 
log b 


satisfies uo(ax*) = 1 + u(x). The 
function f(x) = x + ax also fails to 
meet the same condition, but a solu- 
tion probably exists in this case too. 
Presumably, other functions can be 
found which approximate empirical 
data and which meet the assumed con- 
ditions, but it remains to be seen 
whether the limit ¢ can be evaluated. 





U(x) = 
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The difficulty is, first, in inverting f, 
and second, in finding a simple ex- 
pression for h. Since this is gener- 
ally difficult, we doubt that the mathe- 
matics of this section will be useful to 
psychophysicists who want a non- 
graphic method for cumulating jnd’s. 
It should be pointed out again that 
for the empirically important Weber 
function g(x) = kx + ¢ the solution is 
known: it is 
log (kx + c) 


log (1 + k) 


U(x) = 


A further definition of the sensation 
continuum. So far we have examined 
two formulations of Fechner’s prob- 
lem, both of which are unsatisfactory. 
The first, that of Fechner, contains an 
internal contradiction. The second, 
the functional equation formulation, 
we have shown can be solved. Un- 
fortunately, we have also shown that 
it has infinitely many families of 
different solutions, which is intoler- 
able. In this section we shall propose 
an addition to the second formulation 
which amounts to a method of sum- 
mating jnd’s. We shall show that if 
we demand a particular form of in- 
variance of distances measured in jnd 
units, then there is a unique (except 
for zero and unit) sensation scale for 
each of a wide variety of Weber func- 
tions, and for Weber's law this sensa- 
tion scale is Fechner’s law. 

The common psychological custom 
for measuring distances in jnd’s be- 
tween two points is to use the size of 
the jnd at the lower point as the unit 
of measurement. Although it is rarely 
if ever explicitly stated, it is certainly 
implicitly assumed that if the dis- 
tances ab and cd are both a stimulus 
jnd’s in length, then they have an 
equal number, say K(a@), of sensation 
jnd’s. As a formal mathematical 
condition, this states that 


ulx + ag(x)] — u(x) = K(a) 


f(x) is one x-jnd larger than x. 


fod (x), 
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where K is some fixed, but unknown, 
function of a. It can be shown, first, 
that if u is a solution to this problem, 
dx 
g(x) 
given by Fechner, but, second, that 
there are no solutions except when 
g(x) = cx (Weber's law). We will 
not present a proof of this result since 
it is a blind alley, but we believe that 
it suggests that this customary meas- 
urement of distances should be aban- 
doned. 

We must now consider how such 
distances really should be measured. 
If x and y are more than one jnd 
apart, we may expect the size of the 
jnd to change as we go from «x to y. 
That fact should be taken into account 
in using jnd’s as units of measure- 
ment; failure to take it into account is 
what makes Fechner’s auxiliary prin- 
ciple and the standard measuring pro- 
cedure unacceptable. We shall pro- 
ceed to formulate this more sensible 
method of using jnd’s as measuring 
units. 

Let f(x) = x + g(x); then the point 
The 
point fLf(x)] = f® (x) is one f(x)-jnd 
larger than f(x). In general f‘” (x) is 
one f~) (x)-jnd larger than the point 
Clearly, for y > x, we can 
find some integer ” such that 


then it must be the integral 


IM @sy¥ < form(s) 


and it is reasonable to say that y is 
between m and n + 1 jnd’s larger than 
x. For the moment, let us suppose 
that y was chosen so that y = f(x), 
then we can say y is exactly m jnd’s 
larger than x. It seems plausible to 
require that the same be true of the 
sensory continuum, i.e., 


ul f™(x)] — u(x) =n 


In words, we are saying that if point 
y is 20 stimulus jnd’s higher than 
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point x on the stimulus continuum, 
then it must also be 20 sensation 
jnd’s higher than point x on the sensa- 
tion continuum. If the above condi- 
tion is met for m = 1 (in other words, 
if all sensation jnd’s for a given sensory 
continuum are equal), then it must 
also be met for all larger values of n, 
since 


ul f(x) ] — u(x) 
ufflf"— (x) J} — u(x) 
1+ ul f - (x) ] —_ u(x) 


= 0 


But this takes care of relatively few 
points, and does not allow us to say 
exactly how many jnd’s y is from x un- 
less the difference is a whole number of 
jnd’s. We must find a definition 
which tells us how to subdivide a jnd 
into fractional parts. How to do this 
is not obvious, since the definition of 
distances given above involves iterates 
of f, and these are apparently defined 
only for integers. Fortunately, it is 
possible to generalize the notion of 
an iterate to arbitrary, rather than 
integral, indices. This problem is 
closely related to that of Abel’s func- 
tional equation which Koenigs ex- 
amined; we shall be able to use his 
results. 

First, we can set up some properties 
that a generalized iterate f (x), 
where ¢ is any non-negative number, 
should meet. In_ essence, they 
amount to stipulating that f (x) 
should coincide with the usual defini- 
tion when ¢ is an integer and that the 
same law of composition should hold. 
Formally, it is sufficient to require 
that 


f(x) =x, F(x) = f(x) 


and for every s and t> 0, 


f+*) (x) wn fU (x) ] 
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For integers, the generalized iterate 
coincides with the usual notion, as you 
can see, by repeatedly applying the 
last condition to the second one. 

We have already presented a result 
of Koenigs which showed that if f is 
strictly monotonic and analytic, 1 < 
f'(0) < «, and 0 is the only fixed 
point of f, then there exists a function 
@ defined in terms of the iterates of 
f > such that 


log o(x) 


~ log f’(0) 

is a basic solution to Abel’s equation. 
This means that ¢ is itself a solution 
to what is called Schroder’s equation 


oLf(x)] = f’(0)o(x) 


which is obtained from Abel’s by tak- 
ing exponentials on both sides. Using 
this fact and following Koenigs, it is 
easy to show that ¢~ exists and that 
the function 


F(x) = HOO) }o@)} 


satisfies the three conditions of a gen- 
eralized iterate. We show the latter. 


u(x) 


First, 

f(x) = o [o(x)] =x 
Second, 

f(x) = of’ (O)O(x)] 


And so, using the fact that ¢ satisfies 
Schroder’s equation, 

olf (x)] = f’(O)do(x) = dL f(x) ] 
Hence, 

f(x) = f(x) 

Finally, 
FTF (#)] 

=o" [Lf (0) o(o {Lf (0) ]'6(x) })] 

=o {Lf (0) Lf’ (0) }'6(x)} 

=o {[f' (0) #*'o(x)} 

= fer (x) 
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So, with this definition of the gen- 
eralized iterate, we can generalize the 
above definition of distances in jnd’s 
to prescribe how to deal with fractional 
jnd’s. 

We reformulate our major mathe- 
matical problem: 


Given a Weber function g 
which is analytic, g’(x) >—-1 
for all x > 0, g’(0) > 0, and g(0) 
= 0, to find those functions u(x) 
such that u[f'?(x)] — u(x) = 4, 
for all x > 0, and all ¢ > 0, where 
f® is the generalized iterate of 


f(x) = x + g(x). 


Note that by setting t = 1, this con- 
dition implies the equality of sensation 
jnd’s. 

log @(x) 


First, we show that uo(x) = ee 
log f’ (0) 


solves the reformulated problem : 
ud f'? (x) ] — uo(x) 


in log ol f ’ (x) ] 


log 9 (x) 
log f’(0) 


~ log f’ (0) 

log {[f’(O) ]'@(x)} — log o(x) 
log f’(0) 

t log f’(0) 

log f’ (0) 

=t 


Second, from the results about Abel's 
equation, we know that if there are 
any other solutions to this problem, 
they must be of the form u, = up + 
p(uo), where p is periodic with period 
1. For u, actually to solve the re- 
formulated problem, it is necessary 
that for every ¢> 0, 


t = u,Lf'?(x)] — u,(x) 
uol f(x) ] + pluoLf'? (x) ]} 

— u(x) — pluo(x)] 
t+ plt + uo(x)] — pluo(x) ] 
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Thus, for every ¢> 0, 


pt + uo(x)] = pluo(x) ] 


That is, p must be periodic with every 
period t, and so pisaconstant. Thus, 
up to an additive constant, uo is the 
unique function which solves our re- 
formulated problem. 

In nonmathematical language, in- 
troducing the method of measuring 
fractional jnd’s has enabled us to 
eliminate all solutions to Abel’s equa- 
tion save uo, thus cutting down the 
number of acceptable solutions from 
infinity to one. 

We conclude, therefore, that the 
condition stated in the reformulated 
problem constitutes an acceptable 
definition of a psychophysical sensa- 
tion continuum, in the sense that it 
yields a unique Fechner function for 
any reasonable Weber function. We 
also find that for Weber's law this 
condition yields Fechner’s law. The 
solution of our reformulated problem 
may cause unhappiness because it 
is not the same as the integral ‘‘solu- 
tion” proposed by Fechner, except in 
the special case of the linear general- 
ization of Weber’s law. However, 
we have already shown that the inte- 
gral ‘‘solution’’ contradicts: the defini- 
tion of equal sensation jnd’s. 

It is sad that the integral is not the 
right solution, for its evaluation is 
often easy, and we fear that no working 
psychophysicist will find in our math- 
ematics a tool for determing a sum- 
mated jnd scale any better or more 
efficient than the simple graphic pro- 
cedure of adding jnd’s up one at a 
time. 

The statistical nature of jnd's. So 
far we have sounded as though we 
were treating jnd’s as fixed quantities, 
although every psychophysicist knows 
that jnd’s are statistical fictions, de- 
fined by an arbitrarily chosen cutoff 
on a cumulative frequency curve. 
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However, we now show that our 
method of reducing the infinity of 
solutions to Abel’s equation to one is 
equivalent to treating jnd’s as just 
such statistical fictions. 

We start with the old, famous psy- 
chological rule of thumb: equally often 
noticed differences are equal, unless 
always or never noticed. We define 
P(y,x) as the probability that y is 
discriminated as larger than x. Now, 
this rule of thumb simply means that 
on the sensation continuum the func- 
tion P(y,x) is transformed in such a 
way that it no longer depends on x and 
y separately, but only on the differ- 
ence of their transformed values. 
Put another way, the subjective con- 
tinuum 4 is a strictly monotonic trans- 
formation of the stimulus continuum 
such that the probability that a 
change of 6 units on the sensation 
scale will be detected depends only 
upon 6, and not on the place at which 
6 begins or ends. 

Formally, if we are at a point x of 
the stimulus continuum, and there- 
fore at u(x) on the sensation scale, 
and if a stimulus y is presented such 
that u(y) = u(x) + 6, then the chance 
that y will be detected depends upon 
5, but not on x. If we note that 


y = u (u(x) + 6] 
then the condition is that 
P{u-[u(x) + 6],x} = P(8) 


Our problem is to decide under what 
conditions this problem has a solution 
and what that solution is. To this 
end, we make the assumption that for 
each x, P(y,x) is a strictly monotonic 
increasing function of y. 

We show the following: If the 
above problem has a solution, then 
there exists a function f(x) such that 
P(f(x), x] is independent of x, 
where f‘)(x) is the ¢® iterate of f(x) 
previously defined. The function 
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f(x) — x is a Weber function natu- 
rally defined in terms of P. If there 
is a solution, it is unique and it is the 
solution u» to Abel’s equation u[f(x) ] 
— u(x) = 1. In other words, if there 
is any solution to the problem of 
equally often noticed differences being 
equal, then it is unique and it is the 
solution to our proposed reformula- 
tion of Fechner’s problem. 

The proof is comparatively simple 
and runs as follows. Suppose there 
exists a solution u to the condition 
that P{u[u(x) +6],x} is inde- 
pendent of x for all 5>0. Since P 
is strictly montonic in y for all x, 
there is a unique solution to P(y,x) 
=k for each k, 0 <k <1; call it 
y =fi(x). For any 4, let k = P(8), 
and so by our assumption u must 
satisfy 


u-[u(x) + 6] = fe(x) 


where we have written f; for fp). 
Applying u to this, we have 


ulfa(x)] — u(x) = 6 


Let f = fi. We observe that if é = 0, 
then fo(x) = x. Suppose we choose 
any 6, e> O and let y = f,(x); then 


§ = uLfs(y)] — u(y) 
= ulfslf.(x)]} — uLf.(x)] 
ulfilf.(x)]} — u(x) — « 
Thus, 
u{felfe(x)]} — u(x) = 5 +6 
But, from above, 


uLfere(x)] — u(x) =5 +6 


ul fer(x)] = uffelfe(x) J} 


whence 


fise(x) = filf.(x)] 


Thus, we have shown that f; must 
satisfy the three conditions of a gen- 
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eralized iterate of f, i.e., fs = f® for 
all 6, so a necessary condition for a sol- 
ution is that 


PLf (x), x] 


shall be independent of x. From the 
fact that u[fs(x)] — u(x) = 6 = 
ul f®(x)] — u(x), it follows that the 
solution is unige and that it is the 
same as that given for our reformula- 
tion of Fechner’s problem. 

It probably is not obvious, but the 
point of this section extends beyond 
sensory psychophysics into the scaling 
procedures based on Thurstone’s law 
of comparative judgment. Case V of 
that law is based on the assumption 
that equally often noticed differences 
are equal unless always or never 
noticed. This fact has two interest- 
ing implications. The first and more 
obvious one is that these two appar- 
ently different branches of psycho- 
ological measurements are actually 
doing the same thing (namely, using 
a measure of confusion as a unit of 
measurement by assuming that con- 
fusion is equal at all places on the sub- 
jective scale). The second, less obvi- 
ous implication is that perhaps sensory 
psychophysics can profit by consider- 
ing, as Thurstone and his followers 
have, scaling methods with less rigid 
assumptions which nevertheless are 
based on confusability data. One of 
us (Luce) will pursue this possibility 
further in a forthcoming book3(7). 

Graphic methods for cumulating jnd's. 
Psychophysical data do not come in 
mathematical form. In order to ap- 
ply our method for cumulating jnd’s 
(or Fechner’s, for that matter), it is 
necessary either to put the Weber 
function into equation form, or else 
to develop a graphic equivalent of the 
appropriate mathematical operations. 
The graphical equivalent of Fechner’s 
technique is well known, although 
rarely used (see, e.g., 15, pp. 94 and 














Fic. 1. How tocumulate jnd’s. The size 
of the jnd at the origin is marked off on the x 
axis to find point A, the size of the jnd at A is 
marked off to find point B, and so on. The 
stimulus values A, B, C,. . . correspond to 
the points 1, 2, 3, . . . on the cumulated jnd 
scale. 


147-148). It is, of course, wrong, 
since Fechner’s technique is wrong. 
If our technique is to be of greatest 
applicability, we should provide a 
graphic equivalent also. Unfortu- 


nately, it seems difficult to find a truly 
The only method we 


convenient one. 
know of is to go back to the basic idea 
of adding up jnd’s—the idea that one 
jnd plus one jnd is two jnd’s. The 
method of applying this basic idea is 
given in Figure 1, and was discussed 
earlier in the paper. Its error char- 
acteristics are abour the same as 
those of the graphic techniques of 
integration which have been used in 
the past. Unfortunately, the method 
is tedious; if there are 170 jnd’s 
between absolute threshold and the 
upper limit of discrimination, then 170 
separate operations are required to 
determine the cumulated-jnd scale. 
The errors in these successive oper- 
ations do not multiply, however. 
Practial effects of the new procedure. 
No doubt it is important to under- 
stand Fechner’s logical error and to 
know how to avoid it, but the burning 
question for working psychophysicists 
is: What, if anything, does this do to 
the currently accepted conclusions 
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about the uselessness of adding up 
jnd's? 

First, it is easy to show that under 
some circumstances the difference 
between integration and the func- 
tional-equation solution is substan- 
tial. Consider the class of Weber 
functions g(x) = ax'**: if e is greater 
than zero, the asymptotic error of 
the integral solution as x approaches 
infinity is infinite ; while if e is less than 
zero, the asymptotic error is zero. 
Of course, if e equals zero (Weber's 
law), the two procedures give identical 
results. The order of magnitude of 
the error for small numbers of jnd’s 
depends on the constants in the equa- 
tion; it can be of significant size even 
if e is less than zero. One way of 
looking at it is that the integral solu- 
tion is the approximation given by 
the first two terms of a Taylor series 
expansion of the functional equation ; 
all square and higher power terms of 
the expansion are omitted: 


ulx + g(x)] — u(x) =1 
= u(x) + u’(x)g(x) 


_ g(x)? 
a w(x) &@) oe 
u’ (x)g(x) 
(x)? 


7 e915 


+ ju’'(x) : 


A number of experimental deter- 
minations of jnd’s, particularly for 
intensive continua, produce a curve of 
g(x)/x that first falls and then is flat 
—a function often well approximated 
by g(x) = kx +c. However, for 
some continua the picture is less sim- 
ple. There are some (pitch, for ex- 
ample) where the curve appears to 
rise again at the high end. The fall- 
ing section of these curves corresponds 
to the case e < 0; the flat section cor- 
responds to the case e = 0; the rising 
section corresponds to the case e > 0. 
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However, the x-axis of such graphs 
is usually plotted logarithmically. 
This means that the rising section may 
cover most of the range within which 
the stimulus can be varied—a fact 
which the logarithmic x-axis tends to 
conceal. So it is quite possible that 
the error in using the integration tech- 
nique is substantial for many sense 
modalities and for large ranges within 
each. 

But the possibility of error is ir- 
relevant unless someone has actually 
made the error. Has anyone? Ex- 
tensive examination of the literature 
suggests that the answer is that not 
very many such errors have occurred. 
Some authors are quite unclear about 
how they added up jnd’s, but many of 
them have preferred the step-by-step 
method which corresponds to the 
functional-equation solution because 
it was very simple to do. How 
simple it is, of course, depends on the 
number of jnd’s to be added; we 
doubt very much if the jnd’s for pitch 
will ever be added this way, since there 
are several thousand of them. We 
have found only one clear instance 
(15) in which the graphic equivalent 
of integration has been used (to cum- 
ulate pitch jnd's, as it happens), 
though it has been vigorously recom- 
mended. The general avoidance of 
the graphic equivalent of integration 
may be caused by shrewd intuition 
that something is wrong with Fech- 
ner’s mathematical auxiliary princi- 
ple. Or it may simply be a rare 
instance in which the fear of mathe- 
matical complexity has benefited sci- 
ence. 

Do cumulated jnd’s agree with other 
scales? The results of cumulating 
jnd’s have often been compared with 
the results of other psychophysical 
procedures (4). The most common 
finding has been that the cumulated 
jnd scales do not agree with scales de- 
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termined by fractionation or direct 
magnitude estimation, at least for 
such continua as loudness. A review 
of this literature might seem appropri- 
ate here, but it is quite unnecessary, 
since the relation between scales based 
on confusion data (like cumulated 
jnd scales) and those based on frac- 
tionation or magnitude estimation 
has been extensively and excellently 
discussed in recent studies by Stevens 
(14), Stevens and Galanter (16), and 
Piéron (8, 9, 10). 

The controversy over the relation 
between cumulated jnd scales and 
scales determined by other methods is 
embedded in a larger, sometimes acri- 
monious controversy about the rela- 
tionships among various methods of 
sensory scaling. To some extent we 
shall have to enter the fray. 

The first and most important ques- 
tion is this: Do the different scaling 
procedures, if properly used, lead to 
different scales? Unless we reject a 


great many experiments as improperly 
performed, we must answer ‘‘Yes.” 
But the issue is not as simple or un- 


ambiguous as that answer. For ex- 
ample, Garner (3) has developed a 
loudness scale based on both fraction- 
ation and multisection judgments that 
fits a large number of experimental 
results in auditory psychophysics 
better than does the old sone scale 
(his paper was written prior to the 
development of the new sone scale 
[13 ]). Figure 2 shows the relation- 
ship between that scale and a cumu- 
lated jnd scale for loudness prepared 
by us from Riesz’s data (11,12). The 
two scales seem to be roughly linearly 
related—but does it mean anything 
for the controversy? Riesz’s proce- 
dure has often been criticized, and his 
data are almost 30 years old. The 
form of Garner’s scale (which is all 
that matters for this argument) is 
based primarily on his multisection 
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Fic. 2. The relation between Garner's 
loudness scale and Riesz’s cumulated jnd 
scale. The old sone scale (ASA loudness 
scale) and Stevens's recent revision of it are 
included for comparison. 


rather than his fractionation data. 
Scales based upon multisection data 
usually agree with those constructed 
by confusability methods; the explan- 
ation proposed by critics of these 
methods is that the adjustment of five 
or six stimuli in a multisection experi- 
ment may produce confusion among 
the tones being adjusted. If this 
argument is correct, and if the form 
of Garner's scale is based upon multi- 
section data, it is not surprising that 
the two agree. Our reason for so 
extensive a discussion of Garner's 
scale is that loudness is the central 
battleground of this controversy. If 
the verdict of psychophysical history 
is that confusability and multisection 
scales give results different from frac- 
tionation results for loudness, then 
psychologists will amost certainly 
assume that the two procedures yield 
different results in other intensive (or, 
as Stevens calls them, Class I or 
prothetic) continua. Unfortunately, 
even in psychophysics, not enough uni- 
versally accepted data are available to 
settle the argument. 

If confusability scales and scales 
based upon fractionation or direct 
magnitude estimation agree, no prob- 
lem arises. If not (and we suspect 
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they will not), psychophysicists must 
still evaluate each kind of procedure 
and its resulting scale. Some psycho- 
physicists feel that fractionation and 
magnitude estimation have great face 
validity, and that confusability scales 
are distortions of the scales obtained 
by these procedures. They say that 
fractionation and estimation scales 
correspond to what Ss say they feel, 
they are obtained by straightforward 
procedures rather than indirect ones, 
and, after all, what logic is there in 
basing a measure of magnitude on 
variance or ‘‘noise.”’ 

Other psychophysicists feel that 
confusability scaling is the better 
method. They say that fractionation 
and estimation data are unreliable, 
variable, and, as a rule, at least frac- 
tionation data cannot be turned into 
scales unless obtained from a ‘‘good,” 
which means extensively trained, sub- 
ject. The estimation techniques have 


not been used enough times in enough 
places to indicate clearly what effect, 
if any, training may have on the re- 


sults. Confusability scales can be 
obtained from untrained Ss who have 
no idea what form of scale is wanted 
from them; they can even be obtained 
from animals. 

Each group asserts that its preferred 
scales are more nearly consistent with 
the bulk of psychophysical data than 
the other kind of scales; each group 
can produce impressive arguments to 
buttress its claim. 

Still another position is possible: 
perhaps two different kinds of sen- 
sory processes are being tapped by 
these two different kinds of proce- 
dures. If so, both kinds of scales are 
useful, but for different purposes. 
This could well be the eventual end- 
point of the argument. 

Yet another source of confusion in 
the argument is the treatment of in- 
dividual differences. The custom has 
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been to take means or medians, and 
recently a number of psychophysicists 
have raised vigorous questions about 
the appropriateness of doing so. 
W. J. McGill5 is currently attempting 
to find a better way of respecting in- 
dividual differences while still obtain- 
ing a “‘universal’’ scale. It will be 
interesting to see what light serious 
attempts to do justice to individual 
differences sheds on the differences 
between the two classes of scales. 

The status of cumulated jnd’s has 
been controversial for more than a 
hundred years, and this paper is not 
intended as an attempt to settle the 
controversy. Our main point is that 
Fechner’s problem has been improp- 
erly formulated and that the integral 
usually offered as a solution is not in 
fact a solution when the Weber func- 
tion differs from g(x) = kx +c. We 
have also developed what appears to 
be the correct solution, only to find 
that in computational work it has 
usually been used in spite of its dis- 
agreement with the integral solution. 
This means that our clarification of 
the logical issues underlying Fech- 
ner’s formulation does little to change 
the status of the present, primarily 
empirical, controversy about scaling 
methods. However, one of us (Luce 
[7]) has recently developed a way of 
dealing with confusability data based 
on a simple axiom which, if it works 
out successfully, may resolve the 
difficulty by changing our ideas about 
the meaning of confusability scales; 
this development will be described in 
another publication. 


SUMMARY 


Fechner’s method for adding up 
just noticeable differences (jnd’s) to 
obtain sensory scales is based on a 


5W. J. McGill, Personal communication, 
1957. 
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mathematical error: he used a differ- 
ential equation approximation to a 
functional equation instead of the 
functional equation itself. The func- 
tional equation can, however, be 
solved directly. The solution coin- 
cides with the differential equation 
solution only in the special case in 
which the linear generalization of 
Weber’s law holds exactly. The 
mathematical properties of the formal 
solution are such that it probably 
will not be very useful for practical 
computation, but the extremely sim- 
ple graphical procedure of adding up 
jnd’s one at a time is the graphical 
equivalent of the mathematically 
correct solution. The amount of 
difference between the two proce- 
dures can be calculated for some 
special cases; its size depends on the 
form of the function relating size of 
jnd’s to stimulus magnitude. 

This error does not seem to have 
any significant impact upon the con- 
troversy over the relation between 
cumulated jnd scales and scales based 
on fractionation and direct estimation 
data because most psychophysicists 
have, in fact, ignored the recom- 
mended (incorrect) procedure and 
have stubbornly summated jnd’s in 
the obvious and correct way. 
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STRENGTH OF CARDIAC CONDITIONED RESPONSES WITH 
VARYING UNCONDITIONED STIMULUS DURATIONS * 
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The effects of long and short shocks 
on the conditioning of fear have been 
frequently considered important tests of 
theory of how fear is learned (1, 2, 3, 4, 
9,10). N. E. Miller, for example, has 
stated that according to a strict drive- 
reduction theory of reinforcement “. . . 
other things equal, a signal followed by 
a brief noxious stimulus should acquire 
the capacity to elicit stronger fear than 
one followed by a prolonged noxious 
stimulus” (2, p. 375). 

Despite its theoretical significance, 
there have been few direct experimental 
attacks on this problem. We have found 
only two published conditioning studies 
with duration of noxious US as a major 
variable, one by Bitterman, Reed and 
Krauskopf (1) and the other by Mowrer 
and Solomon (4). Each of these stud- 
ies examined only two values of shock 
duration over a relatively narrow range. 

More extensive data on this problem 
can be extracted from a series of cardiac 
conditioning experiments we have done 
exploring the effects of duration of shock 
on the form of the conditioned heart re- 
sponse (8, 9, 10). These studies pro- 
vide us with an assemblage of data, as 
yet unreported, on the relation of shock 
duration ‘to strength rather than form 
of CR. 

If heart rate change during anticipa- 
tion of a shock is accepted as an index 
of fear, our classical conditioning stud- 


1This is Technical Report No. 15 under 
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ies using shocks of 0.1, 2.0, 6.0, and 15 
sec. allow us to make a direct and 
thorough empirical check on Méiller’s 
statement of the drive reduction view. 
We shall first summarize our own ex- 
periments. 


METHOD 
General 


The apparatus, procedure, and Ss have all 
been described in detail in previous papers 
(8, 9, 10). They are summarized as follows. 
Forty-three male and twenty female college 
student Ss were assigned unsystematically to 
four groups, each run under a classical trace- 
conditioning procedure with a different shock 
US duration (0.1, 2.0, 6.0, and 15 sec.). All 
groups received initially at least ten well dis- 
tributed preconditioning trials of the CS alone 
—a l-sec. tone (60 db, 512 c.ps.). Following 
these were at least ten spaced conditioning 
trials consisting of the tone CS followed in 6 
sec. by the shock US (13 V.A.C.). This mild 
but reportedly unpleasant shock was applied 
across the first two fingers of the left hand. 
Heart rate was measured with an electrocardi- 
ograph before, during, and after each trial. 


Measures of CR and UR 


Since individual differences in the form of 
conditioned cardiac responses occur, a measure 
of heart rate disturbance was chosen which 
would be comparable for the different types of 
response. This was a variability measure, the 
statistical range of heart rates during the con- 
ditioning (posttone) period. It was simply 
derived as follows. For a single S on every 
conditioning trial each heart beat interval be- 
tween the tone and shock was converted to a 
rate measure. There were usually about eight 
or nine of these. The difference between the 
largest and smallest of them defined the range 
for that trial, a measure which we previously 
have designated as Maximum CR. A mean is 
taken over the first ten conditioning trials for 
this measure to represent a single S. 

The same steps as above are taken to com- 
pute the Maximum UR except that ten beats 
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after the shock onset are used instead of the 
eight or nine before. 


RESULTS 


Figure 1 presents the Maximum CR 
and Maximum UR for four groups hav- 
ing different shock durations. Each 
point is the mean of 15 Ss’ data (3 Ss, 
selected without bias, were omitted to 
equalize the Ns in the groups for statis- 
tical convenience). Maximum CR does 
not seem to vary systematically as a 
function of shock US duration. What 
fluctuations do occur are easily attribu- 
table to the corresponding variation in 
Maximum UR. This fact is established 
by Table 1, which gives the results of 
an analysis of covariance of the Maxi- 
mum CR function, using Maximum UR 
as the relevant covariate. When the in- 
fluence of the unconditioned response is 
extracted from the conditioned response 
there is left an insignificant amount of 
CR variation among the shock condi- 
tions. 

The absence of significant differences 
among the group means is not to be in- 
terpreted as due to absence of condition- 
ing. Evidence has already been pre- 
sented (8, 9, 10) to show that a sig- 
nificant amount of cardiac conditioning 
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TABLE 1 


SUMMARY OF ANALYSIS OF COVARIANCE OF 
Maximum CR AMPLITUDE WITH 
Maximum UR AMPLITUDE 


df 





Between 
durations | 
| 


For 3 and 52 df, F is 2.79 for p = 5%. 
The variate-covariate correlation is +.57. 


took place in all four groups, using other 
(correlated) measures of response. To 
demonstrate further that the CRs and 
URs in Fig. 1 represent reliable dis- 
turbances in rate, we have compared 
them with similar measures of pretone 
variability, i.e., the range of ten pretone 
rate measures. Table 2 presents the 
results of a series of ¢ tests of the sig- 
nificance of the differences between mean 
pretone range and mean posttone range 
(for CR), and between mean pretone 
range and postshock (for UR). Each ¢ 
is based on 15 differences, each differ- 
ence itself a mean (over 10 trials) for 
each of 15 Ss. Pretone means were 
small, less than 1.5 B/M for all groups, 
in comparison to the posttone and post- 
shock means shown in Fig. 1. 


DISCUSSION 


We wish to discuss our results briefly 
in relation (a) to previous findings and 
(5) to Miller’s theoretical statement. 


TABLE 2 
PROBABILITY OF CHANCE OCCURRENCE OF MEAN 
| DiIsTURBANCE AND BotH CR Anp UR 


FOR THE Four DURATIONS 
oF SHock US 
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Fic. 1. The maximum amplitudes (range) 
of CR and UR are plotted as a function of 
shock US duration. One standard deviation 
is indicated by a vertical line on either side of 
each of the mean values comprising the points 
of the CR function. 
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The flat gradient in Fig. 1 relating 
strength of CR to US duration is not 
inconsistent with the findings of Bitter- 
man, Reed, and Krauskopf (1), who 
found no significant difference in the 
strength of conditioned GSR with shock 
duration of .5 and 3 sec. This lack of 
difference, however, may have appeared 
because they used the same Ss for both 
durations. Their experiment employed 
different signals for short and long 
shocks; and, if the Ss discriminated the 
signals verbally, they were assumed not 
to have generalized autonomically (i.e., 
GSR). The weakness of this assump- 
tion is shown by the fact, reported by 
Notterman, Schoenfeld and Bersh (5), 
that Ss who state they expect no shocks 
during experimental extinction continue 
to show conditioned cardiac disturbance. 

Mowrer and Solomon (4) paired a 
light signal with shocks of 3 sec. for 
one group of rats and with shocks of 
10 sec. for another group. The strength 
of fear conditioned to the light was 


measured by the capacity of the light 
to act as effective punishment of a lever- 
pressing response. For these two dura- 
tions of shock, an abrupt onset and 
termination of shock was employed. No 
significant difference between the two 
groups was found with respect to their 


inferred fear. Two further groups were 
run under a similar procedure, except 
that shock was terminated gradually 
rather than abruptly. For one group 
the shock was 4 sec. long and for the 
other, 7 sec. No reliable difference in 
fear was reported between these groups 
either. Despite the wide differences in 
Ss, procedures, and methods of inferring 
fear, the results of the Mowrer and 
Solomon study are in agreement with 
our own findings. Such agreement gives 
us confidence that there is no relation- 
ship between indices of fear and noxious 
stimulus durations. 

The discrepancy between the empirical 
findings and Miller’s theoretical formu- 
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lation of the drive-reduction view de- 
serves comment. It will be noted that 
Miller prefaced his statement that short 
noxious stimuli would produce more fear 
than long ones with the phrase, “other 
things equal.” There are at least two 
important “other things” here; the first 
is strength of US, or more precisely, 
subjective strength. This amounts to 
a statement of the need to control the 
strength of the shock, or other noxious 
stimulus, as it feels to the subject. If, 
through temporal summation, the longer 
shocks are felt as stronger, then stronger 
fear might result from them despite the 
correctness of Miller’s statement, be- 
cause more intense unconditioned stimuli 
are known to create stronger CRs (6, 7). 
The second important factor in need of 
control is adaptation to shock. If there 
is a period of rapid adaptation during 
the early part of a prolonged shock, it 
might produce the major part of drive 
reduction necessary to condition fear. 
An effect of this kind shortly after shock 
onset could make the length of the physi- 
cal shock quite irrelevant to the timing 
of the drive reduction. 

With respect to the first point, tem- 
poral summation, we have one way of 
assessing the apparent or subjective in- 
tensity of shock, that is, by the maxi- 
mum or total magnitude of the uncondi- 
tioned response. Our results show that 
there were no systematic differences in 
the magnitude of UR for the different 
shock durations. Hence, we have some 
evidence, at least, that the over-all sub- 
jective intensity of the US was rela- 
tively constant for the various dura- 
tions. We conclude from this that the 
discrepancy between Miller’s theoretical 
statement and our data can not easily 
be attributed to lack of control of shock 
intensity. 

With respect to the second factor, 
shock adaptation, we appeal this time to 
the form of the unconditioned response 
as a relevant datum. It is true that 
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adaptation to the shock occurs if mo- 
mentary magnitude of the UR reveals 
subjective shock intensity. We have 
shown (8, 9, 10) that regardless of shock 
duration in the 0.1 — 15 sec. range the 
heart-rate response to shock takes the 
form of a rapid acceleration to a maxi- 
mum rate within 4 beats of shock on- 
set, followed by a slow irregular return 
(adaptation) to normal requiring more 
than 20 beats. If the drive property 
of shock parallels the course of UR mag- 
nitude in time, we would be led to say 
that drive reduction is remarkably un- 
related to shock duration in the range 
investigated. In terms of Miller’s ap- 
plication of Hull’s theory, this would 
mean that no differences in condition- 
ing of fear should occur because sub- 
jective shock intensity reduction (the 
presumed reinforcing agent) was equally 
delayed for all groups. If this line of 
reasoning is correct, our experiment un- 
fortunately provides no real test of a 
theory that predicts the consequences of 
varying delays of reinforcement. 

This difficulty, moreover, may not be 
restricted to the present experiment. As 
long as there is no demonstrable rela- 
tionship between objective and subjec- 
tive properties of shock, the possibility 
of a simple experimental test of Miller’s 
assertion is precluded. Use of a stronger 
shock (if Ss would endure long dura- 
tions) might conceivably eliminate the 
troublesome adaptation effect, but only 
at the cost of a corresponding increase 
in the likelihood of destroying our pres- 
ent control of temporal summation. 

The present data may be accounted 
for by two-factor or contiguity theories 
which stress the importance of the on- 
set rather than the offset of shock (3, 
4), but the data are not critical tests of 
this type of theory. 


SUMMARY 


No relationship has been found in 
human Ss between conditioned heart 
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rate response magnitude and a wide 
range of shock US durations. 

Under the assumption that cardiac 
disturbance is an index of fear, this fact 
is related to a deduction from a drive- 
reduction theory of fear, according to 
which a signal followed by a brief nox- 
ious stimulus should require the capacity 
to elicit stronger fear than one followed 
by a prolonged noxious stimulus. It is 
concluded that this proposition is either 
incorrect or not testable under the con- 
ditions provided. 
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STIMULUS AND RESPONSE GENERALIZATION: 
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GRADIENT FROM A TRACE MODEL! 
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It is now generally acknowledged 
that (a) a response conditioned to one 
stimulus tends also to occur to other 
stimuli, and (6) the magnitude of this 
response-tendency (for any particular 
one of those stimuli) is governed by 
the dissimilarity between that stimu- 
lus and the stimulus to which the 
response was originally conditioned. 
Indeed, this principle of stimulus gen- 
eralization is of such fundamental 
importance that any quantitative 
theory of behavior that fails to deal 
with it explicitly can only be regarded 
as incomplete. It is not surprising, 


therefore, that a number of investiga- 
tions have been specifically concerned 


with the form of the function relating 
generalized response-tendency to in- 
terstimulus dissimilarity. What may 
seem surprising, though, is that the 
conclusions of these various studies, 
far from converging on a unique func- 
tion, have diverged to the several 
different functions illustrated in Fig. 
1. Beyond those who doubt the 
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existence of a quantitatively invariant 
“gradient of generalization’ in the 
first place (5, 16, 21), there are those, 
like Schlosberg and Solomon, who 
consider this gradient to be linear, as 
in A (8, 14, 22, 23) ; those, like Spence, 
who consider it to be convex upward, 
as in B (27, 28); and those, like Hull, 
who consider it to be concave upward, 
as in C (1, 11, 12, 19, 20). In addi- 
tion, if the “‘discriminal dispersion” of 
the psychophysicists can be inter- 
preted as a gradient of generalization, 
there is support for a bell-shaped func- 
tion which is first convex and then 
concave upward as shown in D (9, pp. 
317-319). 

In addition to stimulus generaliza- 
tion, an analogous phenomenon of 
response generalization is sometimes 
supposed to operate so that (a) a 
stimulus to which one response has 
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Fic. 1. Several proposals for the form of 
the gradient of stimulus generalization. The 
abscissa in each case is some measure of the 
dissimilarity between two stimuli, and the 
ordinate is some measure of the tendency of a 
response, conditioned to one of the two stimuli, 
to follow the presentation of the other. 
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been conditioned tends also to evoke 
other responses, and (6) the magnitude 
of this tendency (for any particular 
one of those responses) is governed by 
the dissimilarity between that re- 
sponse and the response originally 
conditioned. Although this principle 
of response generalization is also of 
considerable theoretical importance, 
even less progress has been made to- 
ward the quantitative determination 
of the shape of the gradient in this 
case than in the case of stimulus gen- 
eralization. 

Actually, that a unique gradient of 
generalization has failed to emerge 
both in studies of stimulus generaliza- 
tion and in studies of response general- 
ization no longer seems so surprising 
when one examines these studies in 
detail. For although the obtained 
gradient must depend crucially upon 
the choice of the independent variable 
(i.e., dissimilarity), the various in- 
vestigators have never been able to 
agree on any one measure of dissimi- 
larity as appropriate. Furthermore, 
since tests for generalization have been 
conducted during various stages of 
discrimination learning, as well as 
after various periods of extinction, 
one must consider the possibility that 
the form of the gradient may change 
radically under different conditions of 
reinforcement (13, 31). 

This paper will attempt to show 
that, by approaching the generaliza- 
tion problem from a somewhat differ- 
ent direction, considerable evidence 
can be adduced for the proposition 
that the gradients of stimulus and re- 
sponse generalization both conform to 
an exponential decay function (Curve 
C of Fig. 1). Further evidence will 
be presented to indicate that the form 
of the gradient does indeed depend 
upon the schedule of reinforcement 
and, more particularly, that it changes 
from the exponential to a bell-shaped 
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function (Curve D) as nonreinforced 
trials are introduced with greater and 
greater frequency. 


THE EXPONENTIAL GRADIENT IN 
PAIRED-—ASSOCIATE LEARNING 


In view of the difficulties attending 
any effort to establish one measure of 
dissimilarity as a standard, the follow- 
ing strategy was recently proposed: 
Stimuli (or responses) were con- 
ceptualized as points in a ‘‘psychologi- 
cal space”’ in such a way that the dis- 
tance between any pair of these points 
represented the psychological dissimi- 
larity between the corresponding pair 
of stimuli (or responses). By taking 
account of a set of metric axioms which 
any measure of distance should satisfy, 
it was shown that hypotheses about 
the shape of the gradient of generaliza- 
tion could be tested without resorting 
to an independent measure of dissimi- 
larity (25). Inaseries of experiments 
on stimulus and response generaliza- 
tion during paired-associate learning, 
substantial support was obtained for 
the hypothesis that generalization is 
an exponential decay function of psy- 
chological distance (26). Since alter- 
native hypotheses were not tested, 
however, it seems desirable to present 
data in a form that will clearly rule 
out the Functions A, B, and D of Fig. 
1. 

Now in a paired-associate experi- 
ment there are N stimuli, S;, So, . 
Sy, to which are assigned .V responses, 
R,, Ro, . . . Ry. On each trial one of 
the N stimuli is presented to the sub- 
ject who must in turn produce one of 
the N responses. Various procedures 
of differential reinforcement can be 
used to communicate to the subject 
the prevailing assignment of the re- 
sponses to the stimuli. Sometimes 
the subject is simply informed after 
each trial whether his response was or 
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was not the correct one (i.e., the one 
assigned to the stimulus presented on 
that trial) ; at other times more elabo- 
rate methods of ‘“‘correction”’ are used 
(26). In any case the so-called as- 
signment is simply a rule which the 
experimenter follows in delivering the 
reinforcements, and any arbitrary 
one-to-one assignment can be estab- 
lished in this way. 

Consider then a_paired-associate 
experiment in which the responses are 
highly distinctive and so lead to neg- 
ligible amounts of generalization. If 
there are N pairs consisting of one 
stimulus and its assigned response, 
the data from such an experiment can 
be represented by an N X N matrix 
giving, for every S; and S,, the condi- 
tional probability Ps’ with which S; 
leads to the response assigned to S, 
(25). A very basic measure of stimu- 
lus generalization between S; and 5S, is 
then provided by either of the prob- 
abilities Pa’ or P,;S. Comparison 
between different experiments is facil- 
itated, however, by adjusting these 
measures so that the generalization 
between any stimulus and itself is 
always unity. This can be done by 
replacing the absolute probabilities 
Py and P,,;* with the ratios Py 5/ 
PS and P,;5/P,,.8. Furthermore, 
these two ratios can be averaged to- 
gether to furnish a single, more stable 
estimate of the generalization between 
S;and S,. Since there are theoretical 
reasons for preferring the geometric 
to the arithmetic mean (25), the meas- 
ure of stimulus generalization might 
be defined by the formula 


| PaSPriS 

Pi SPS 
Owing to the gradual manner in 
which differential reinforcement takes 


effect, however, there is an _ initial 
phase during which the subject re- 
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sponds more or less randomly. This 
means that the function given in the 
above formula necessarily levels off at 
some positive asymptote for large 
interstimulus distances. Again, in 
order to compare data from different 
experiments, this asymptote must be 
brought down to zero for each experi- 
ment. This is accomplished by esti- 
mating a parameter C*% from each set 
of data (25), and by defining the gen- 
eralization between S; and S; to be 


Gas = (1 + C4) 
PS Pais 


x —cmenhmes ns i 
Vespa oF 01 
Likewise, in an experiment with 
highly discriminable stimuli, the re- 
sponse generalization between R,; and 


R; will be given by 
Ga? = (1 + C*) 


Pu®P, R 
x ‘ joa Soe” am ite 
PF Pu* c (2] 


where P,* is the conditional probabil- 
ity of Ry, given the stimulus assigned 
to R; (25). 

Equations 1 and 2 then specify the 
dependent variables for paired-associ- 
ate experiments, i.e., stimulus and re- 
sponse generalization. With regard 
to the independent variables, i.e., 
interstimulus and interresponse dis- 
tances, it is clear that physical meas- 
ures are not directly applicable. 
Thus the psychological distance be- 
tween two tones at a fixed difference 
in intensity changes as both tones are 
increased in intensity. But this 
does not mean that there is no relation 
between physical and _ psychological 
distance. On the contrary, at least 
the following two statements can be 
made : (a) two tones which are brought 
arbitrarily close together in terms of 
physical measures also approach each 
other psychologically ; (b) as two tones 
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separated by a fixed difference in 
intensity are increased in intensity, 
the psychological distance (although 
it does not remain fixed) nevertheless 
changes in a gradual and continuous 
manner. These considerations form 
a basis for the assumption that the 
locus of a set of stimuli in psychologi- 
cal space can always be obtained from 
their locus in physical space by a 
transformation that is (a) continuous 
and (6) differentiable (25). 

Although the stimuli and responses 
chosen for experiments on paired- 
associate learning have typically been 
words or nonsense syllables, nonverbal 
materials can be used just as well. 
Indeed, for the purposes of establish- 
ing a measure of the psychological 
distance between stimuli, it is espe- 
cially convenient to choose stimuli 
(such as tones differing only in in- 
tensity) that can be varied along a 
single physical dimension. In gen- 
eral, of course, stimuli that are evenly 
spaced along a single physical dimen- 
sion will be neither evenly spaced nor 
confined to a straight line in psy- 
chological space. Nevertheless, it fol- 
lows from the assumptions of contin- 
uity and differentiability that, if 
stimuli are evenly spaced over a 
sufficiently smail range of a single 
physical dimension, then they will be 
spaced approximately evenly along an 
approximately straight line in psy- 
chological space. 

Suppose, then, that the N stimuli of 
a paired-associate experiment satisfy 
this special condition which, for pres- 
ent purposes, will be called the linear- 
ity condition. Such stimuli can be 
designated as S;, S2, . . . Sw in such 
a way that the subscripts correspond 
to the ordinal positions of the N 
stimuli along the common (approxi- 
mately straight) line. The average 
generalization for all pairs of stimuli 
just D steps apart along this line will 


then be given by 


1 
G(D) = Fp LGe 


(withi-—&=D) [3] 
where, as indicated, the summation is 
carried out over the N — D stimulus 
pairs separated by just D steps. Sim- 
ilarly, if the responses meet the linear- 
ity condition, this formula can be used 
as a measure of the average general- 
ization for all pairs of responses sep- 
arated by D steps. Thus, Gx in 
Formula 3 can stand for either Gy* or 
Gi*. 

Now the average generalization 
G(D) and the separation D have 
properties that particularly recom- 
mend them for investigations of the 
relation between generalization and 
distance. First of all, D is quite in- 
sensitive to the physical measure used 
to space the stimuli evenly along the 
chosen dimension. ‘Thus, if the stim- 
uli are squares differing only in size, 
it does not much matter whether these 
squares are spaced in accordance with 
constant increments of area or length 
of side. It is a consequence of the 
linearity condition that these two 
variables (a? and a) will be almost 
linearly related within the small range 
of variation permitted along the size 
dimension. In fact, any variable that 
is equivalent to these variables except 
for a continuous, differentiable trans- 
formation could presumably be used. 
Thus the stimuli could just as well be 
regularly spaced in terms of measures 
based upon psychophysical procedures 
such as the summation of jnd’s, cate- 
gory judgment, magnitude estimation, 
etc. (29). All of these measures are 
continuous and differentiable func- 
tions of the usual physical variables. 

This indifference of D to the vari- 
able chosen for evenly spacing the 
stimuli is further enhanced by com- 
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Fic. 2. An exponential decay function 
fitted to data from several experiments in- 
volving stimulus or response generalization 
during paired-associate learning. The center 
of each plotted symbol (triangle, square, 
circle) has average generalization G(D) as 
ordinate and distance D (multiplied by a con- 
stant factor) as abscissa. The experiments 
were conducted by Attneave (1), McGuire 
(18, 26), and Shepard (26). In the experi- 
ment with circles of variable color, the stimuli 
were not confined to a single physical dimen- 
sion. Since the linearity condition was not 
met in this case, the D-values were taken from 
a multidimensional scaling solution obtained 
by Torgerson for these same colors (30, Ch. 
11). Since the stimuli were contained in a 
relatively small region of psychological space, 
it seems safe to assume that Torgerson’s 
judgmental method of triads yields satisfac- 
tory estimates of psychological distance in the 
present sense. In the experiment on response 
generalization, it had been found that the 
two end-responses did not conform to the 
linearity requirement (26). The plotted 
data were therefore based upon the generaliza- 
tion between the seven intermediate responses 
only. Finally, Attneave’s experiments devi- 
ated from the design presupposed by the pres- 
ent analysis in the following respects. (a) 
The stimuli were quite widely spaced and so 
may not have met the linearity requirement. 
(6) The number of trials was much smaller 
than in the other experiments. 
the steady-state condition assumed in the en- 
suing theoretical discussion seem a little un- 
likely. (c) Since Attneave reported only the 
sums PS + Py;S but not the individual 
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puting G(D) as an average for all 
pairs of stimuli separated by D steps. 
Thus, even if there is some residual 
systematic contraction or expansion 
of psychological distance, as one pro- 
ceeds from the first to the last pair of 
stimuli at a given separation D, this 
systematic effect will be largely can- 
celed out by averaging both ends of 
the stimulus range together. 

In Fig. 2 average generalization 
G(D) is plotted against D for several 
experiments on the learning of paired 
associates. In each experiment sev- 
eral different random assignments of 
the responses to the stimuli were em- 
ployed. Since the average psycho- 
logical spacing of the stimuli and re- 
sponses varies somewhat from experi- 
ment to experiment, the D-values for 
each experiment are multiplied by a 
constant factor to make them compar- 
able with the D-values for the other 
experiments. The conclusion is clear: 
Although the empirical points fall 
closely along an exponential decay 
function (the fitted curve), they devi- 
ate markedly from the alternative 
Functions A, B, and D of Fig. 1. 


DEDUCTION OF THE EXPONENTIAL 
GRADIENT FROM A TRACE 
MOopDEL 


The principal purpose of this paper 
is to propose a model to account for 





probabilities, it was necessary to substitute 
the approximate formula (Pi + Pxe’)/ 
(Pic + Pix’) for the geometric mean used in 
Equation 1. However, in a previous investi- 
gation this approximation was very close and, 
indeed, possessed greater statistical stability 
than the geometric mean (24). (d) There 
were not sufficient data to make the estima- 
tion of the constants C% feasible. For the 
purposes of plotting the data, therefore, CS 
was assumed to be zero. That the agreement 
between the various sets of data is good de- 
spite the deviations from optimum conditions 
in certain instances suggests that the data 
may not be particularly sensitive to such 
deviation. 
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the empirically determined exponen- 
tial form of the gradient of general- 
ization in terms of a hypothetical 
trace process. The aim of such a 
proposal is twofold : First, it is desired 
to remove the apparent arbitrariness 
of the exponential gradient by showing 
that it follows from certain elementary 
assumptions of a more intuitively 
compelling character. Second, it is 
hoped that such a model will provide 
for prediction to other experiments in 
which the usual paired-associate con- 
ditions no longer prevail. 

The model is suggested by recogni- 
tion experiments of the following kind. 
A subject is shown a certain square 
and, after a delay of ¢ units of time, 
shown a second square which may or 
may not be the same as the first. In 
this situation the probability of re- 
sponding “‘same’”’ is distributed, with 
respect to the difference in size between 
the first and second squares, according 


to some bell-shaped density function. 
Since the time error is usually small, 
the mode falls near the zero-difference 


point. The variance of the distribu- 
tion, however, increases appreciably 
with the delay ¢ imposed between the 
first and second exposure (2, 3, 10, 17). 

The unidimensional stimuli, S;, So, 
. . . Sy, in an experiment on paired- 
associate learning, will be designated 
by small circles arranged in a vertical 
row along the left, as in A of Fig. 3. 
Corresponding to these, there are con- 
ceived the internal representations 
(perceptions), S,*, S2*, Sy*, 
which will be designated by the small 
circles on the right. Whenever a 
stimulus S; is presented, it leads to 
some one of the internal representa- 
tions S,*, with probability Py. If 
the responses are so distinctive as 
never to be confused, reinforcement 
will ensure only if a stimulus is fol- 
lowed by its corresponding representa- 
tion (25). 
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Fic. 3. The diffusion and deconditioning 
of the stimulus trace. The arrows connecting 
a single stimulus S to the alternative percep- 
tual representations S* in A, B, and C illus- 
trate how the trace elements are assumed to 
spread out with time. In D, E, and F the 
process is represented as continuous. 
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Suppose, then, that when S; leads to 
S;*, a large number of trace elements 
are conditioned (by reinforcement) 
from S; to S;*._ This bundle of trace 
elements or “stimulus trace’’ is des- 
ignated by the arrow in A. Immedi- 
ately following the removal of the ex- 
ternal stimulus S;, the trace elements 
are subject to haphazard perturba- 
tions so that some of the conditioned 
trace elements wander to adjacent 
stimulus representations as shown in 
B. (One might imagine here the 
action on the synapses of the random 
molecular processes associated with 
metabolism.) Later still, some of 
these elements will wander even fur- 
ther from S;*, as shown in C. 

Now for any two stimuli differing 
along a single physical dimension, 
another stimulus can be found which 
is situated intermediately between 
these. Thus, as indicated in D, a 
continuum may be conceived as under- 
lying the corresponding internal repre- 
sentations. This continuum then is 
the psychological space of the stimuli. 
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After removal of the stimulus, the dis- 
tribution of .trace elements progres- 
sively spreads out in this space, as 
illustrated in D, E, and F. 

In addition to this spread or diffu- 
sion of the trace, it is assumed that the 
trace elements are also subject to 
spontaneous deconditioning (again, 
presumably owing to random proc- 
esses at the molecular level). Thus, 
as the elapsed time increases, the 
number of elements still conditioned 
decays to zero. In Fig. 3, the shaded 
areas represent the fraction of the 
original trace elements which are still 
conditioned at each time. 


The Deconditioning of the Trace 


The probability that a given condi- 
tioned element will suffer decondi- 
tioning during a small interval Af will 
be denoted by U (At). The simplest 
rule that can be assumed to govern 
this probability is as follows: 

Assumption I. U(At) isaconstant, 
independent of the time the element 
has remained conditioned and inde- 
pendent of the distance (in psycho- 
logical space) to which the element has 
drifted in that time. 

The probability that an element, 
still conditioned at time ¢, remains 
conditioned until ¢ + At is, of course, 
1 — U(At). Therefore, the probabil- 
ity that a given trace element remains 
conditioned during the first m intervals 
of length At, but then becomes decon- 
ditioned during the immediately suc- 
ceeding At-interval, is, by Assumption 


I, 
[1 — U(At) "U (At) [4] 


Now U(At) is necessarily propor- 
tional to the interval chosen for Af. 
It is convenient, therefore, to define 
a deconditioning parameter, 

. U(At) 
U = lim 
ato «A 


(S] 
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which does not depend upon the choice 
of an arbitrary interval At. Then, if 
the ¢-axis is translated so that condi- 
tioning takes place at ¢ = 0, m can be 
increased in such a way that, as At > 
0, m- At — t. 

Although the probability that a 
given trace element is deconditioned at 
precisely time ¢ is zero, the probability 
per unit time (the “probability den- 
sity”) or rate of deconditioning for an 
individual trace element is, at the 
particular instant ¢, 


lim [1 — U(At) ]*4*U(at)/at 


Ato 


But by Equation 5 as At > 0, U (At) > 
U-At— Ut/m. Therefore, the rate 
of deconditioning at time ¢ is, for 
single trace elements, 


lim [1 — Ut/m]"U 


[6] 


where, for convenience in what fol- 
lows, exp (— Ut) is used in place of 
Ve, 

With regard to the deconditioned 
trace elements, the following rule is 
the simplest that can be assumed: 

Assumption II. In the absence of 
further reinforcements, a decondi- 
tioned element remains deconditioned. 

From this assumption and Equation 
6 it follows that the fraction of the 
originally conditioned elements re- 
maining conditioned at time ¢ is 


= Uexp (— Ut) 


Uj) =1- f U exp (— Ur)dr 
=exp(— Ut) [7] 


The Diffusion of the Trace 


Equation 7 completes the quantita- 
tive formulation of the deconditioning 
of the trace. In order to provide a 
similar formulation for the diffusion 
of the trace, it is necessary to consider 
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the motions of the individual trace 
elements in psychological space. The 
exposition is simplified by continuing 
to suppose that the stimuli are evenly 
spaced along a restricted range of a 
single physical dimension. It is then 
possible to introduce one coordinate 
x; for each stimulus representation S,* 
giving its position along a one-dimen- 
sional psychological space. The psy- 
chological distance between any two 
stimuli, S; and S,, is then 


Du = xy — x, | [8] 


Now the expression V,(Ax, Af) 
will denote the probability that, if at 
time ¢ a trace element is situated at xj, 
by time ¢ + At it will have moved into 
the one-dimensional region bounded 
by x, and x, + Ax. As before, the 
arbitrary interval Ax can be eliminated 
by defining a new quantity 


Va(At) = lim Va(Ax, At) [9] 
Ar+0 Av 

Va(At) is the probability density 
of displacements from x; to x, during 
the brief interval At. The simplest 
rule that can reasonably be assumed 
to govern this quantity is as follows: 

Assumption III. Vyx(At) is an 
invariant function of the psychologi- 
cal distance between S; and S. It 
does not depend upon the time ¢ or 
upon the absolute position of the pair, 
S; and S;, in psychological space. 

If the x-axis is translated so that x, 
= 0, it follows from Assumption III 
that, for a given interval Af, there 
exists a fixed function f4;, such that 


Via (At) = far(Die) = fae! xe | [10] 


However, it will not be necessary to 
make any particular assumption about 
the form of the function fa. It is 
only necessary to insure that, during 
a short period of time, the probability 
of a very large displacement is negli- 
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gible. This can be stated with greater 
precision as follows: 

Assumption IV. The function fae 
and, thus, the probability density of 
displacements of a trace element from 
x = 0 is distributed over the x-axis 
with finite variance. 

From Assumptions III and IV it 
follows that the variance of the dis- 
tribution of displacements during an 
interval At is the finite constant 


V (dt) = f  eeVa(dtde, [11] 


Since V(At) must depend upon the 
length of the interval Af, it is useful to 
define a diffusion parameter, 


V(At 
[12] 


which does not require the stipulation 
of an interval At. Just as U governs 
the rate of deconditioning of the trace, 
then, V controls the rate of spread or 
diffusion of the trace in psychological 
space. 

The next question to be answered is 
this : Given the diffusion parameter V, 
what form will the distribution of 
trace elements take after an appreci- 
able delay ¢? It is possible to show 
that the assumptions which have been 
made are sufficient conditions for the 
desired distribution to tend toward a 
limiting form which is independent of 
the form of fs: (6, 15). Specifically, 
if a trace comprising a large number 
(n) of elements is conditioned from S; 
to x; at t = 0, the density of these 
elements at x, for some later time ¢ 
conforms with the Gaussian function 


n-Vix(t) = n-(24Vt)-3 
X exp [— (Du)?/2Vt] [13] 
The beauty of this result is its com- 
plete independence from the under- 


lying mechanism symbolized by fa:. 
Thus, even though one imagines, for 
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example, that thermal agitation of the 
molecular substrate is responsible for 
deconditioning and diffusion, the bio- 
physical details of this process need 
not be specified. For, according to 
the present formulation, these details 
are irrelevant to the question of the 
gross behavior of the trace system. 


The Trace Process in Paired-Associate 
Learning 


Now in the course of learning paired 
associates, each stimulus S; will have 
been presented on many occasions. 
Furthermore, on a number of these 
occasions, reinforcement of the re- 
sponse assigned to S; will have condi- 
tioned a bundle of m trace elements to 
S;*. Thus at some given time to, the 
density of conditioned elements re- 
sulting from the immediately preced- 
ing reinforcement will be distributed 
in psychological space as shown for 
t_, in Fig. 4. Likewise, the densities 
remaining from earlier reinforcements 
will be distributed as shown for t_s, 
t_3, and so on. 

The total distribution of condi- 
tioned elements emanating from S; at 
to can be found by summing the 
Gaussian distributions resulting from 
all previous reinforcements of the re- 
sponse assigned to S;.__ It is possible to 
derive an analytic approximation to 
this composite distribution (jf; in 
Fig. 4) by supposing that, after an 
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Fic. 4. A series of Gaussian distributions 
with increasing variances, and the composite 


distribution arising from an integration of 
these over time. 
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initial phase of rapid learning, rein- 
forcements occur at relatively fre- 
quent and regular intervals. Ii the 
early trials are disregarded, then, a 
roughly steady-state process can be 
considered. The summation of the 
Gaussian curves of various ages can 
then be approximated by an integra- 
tion of these curves over f. 

Now, with respect to S;, the density 
of trace elements at x; resulting from 
a reinforcement ¢ time units ago is 
given by m-Vxu(t). However, only 
the fraction U(t) of these is still con- 
ditioned. Therefore the density of 
elements at x, which remain condi- 
tioned after a delay ¢ from reinforce- 
ment is m- U(t)- Va(t). Clearly, then, 
the total density of conditioned trace 
elements resulting from all previous 
reinforcements of the response assigned 
to S, is distributed in approximate ac- 
cord with 


f n- U(t): Via (t)dt 
0 


= in n-exp (— Ut)-(2xVt)-} 
-exp[— (Dx)?/2Vt]dt [14] 


Fortunately, the integration can be 
effected (4, p. 144) and, indeed, yields 


n(2UV)-3 
-exp[— Dy(2U/V)§] [15] 


For intermediate stages of paired- 
associate learning, then, this function 
can be taken as a measure of the 
strength of connection of the stimulus 
S; to the point x, on the perceptual 
continuum. It is not a probability 
density, however, since (if > 1 or 
U <1) the integration of this func- 
tion over the x-axis yields a value 
greater than unity (7, p. 133). 


Multiplication of Equation 15 by 
U/n converts this function to a prob- 
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ability density, 
Py = (U/2V)! 
‘exp [— Du(2U/V)4] [16] 


for then 


f Padx, = 1 [17] 


Now Equation 16 furnishes an esti- 
mate of the conditional probability 
(per unit x) of the particular percep- 
tion x;,, given the stimulus S;. In 
order to secure the probability of 
taking S; to be S, (through stimulus 
generalization), S,* can be reinterpre- 
ted as a finite region partitioned off 
from psychological space in the neigh- 
borhood of x,. In this way the entire 
one-dimensional space can be divided 
into N mutually exclusive and ex- 
haustive segments so that each seg- 
ment corresponds to one of the ex- 
ternal stimuli. <A given trace element 
is then said to be conditioned to S;* at 
time ¢ if and only if it falls in the region 
containing x; at that time. 

Px, reinterpreted in this way, can 
be taken as an approximation to the 
conditional probability that the ex- 
ternal stimulus 5S; will lead to the 
internal representation S,*. If the 
early trials are ignored (since these 
trials do not sufficiently approach a 
steady state), the constant C* in 
Equation 1 can be disregarded (25). 
Setting C‘% = 0, and substituting the 
right-hand member of Equation 16 
for Px* in Equation 1, then, the 
generalization between S; and S, as- 
sumes the remarkably simple form 


[18] 
Letting « stand for the constant 
V2U/V and averaging GyS over all 


pairs of stimuli separated by a fixed 
distance D, Equation 3 now takes the 


form 
[19] 


GaS = exp (— Du V2U/V) 


G(D) =e 


251 


But if « is identified with the constant 
distance multiplier calculated for each 
experiment, this is precisely the ex- 
ponential function fitted to the empir- 
ical data in Fig. 2. 


Further Aspects of the Trace Process 


The role of deconditioning. It might 
have seemed unnecessary to include 
the deconditioning assumptions (I and 
II) along with the diffusion assump- 
tions (III and IV), since the process of 
diffusion alone would account for the 
spread of the trace, and hence for 
stimulus generalization. However, 
from Equation 18 it is clear that, with- 
out deconditioning, generalization 
would be so extensive as to prohibit 
the learning of paired associates. 
For if U = 0, Ga = 1 for all 7 and b. 
In this case the gradient is perfectly 
flat, and discrimination between stim- 
uli is impossible. This is a conse- 
quence of carrying the integration of 
Equation 14 out to infinite ¢. Alter- 
natively, one could integrate only out 
tosome finite valuet = 7. However, 
in terms of the model, this is equiva- 
lent to assuming that the independent 
elements simultaneously suffer de- 
conditioning at the same instant 7 
time units from conditioning. Ra- 
ther than postulate coincidences of 
this kind, it seems more plausible to 
assume the gradual kind of fading away 
implied by Assumptions I and II. 
This fading away (or forgetting) then 
serves the adaptive function of weight- 
ing old, diffuse traces less heavily than 
new, accurate traces. 

Of course, the integration of Equa- 
tion 14 out to infinite ¢ is not strictly 
justified, since the learning experi- 
ment itself proceeds for only a finite 
period. The error introduced by the 
infinite integration is small, however, 
if the deconditioning parameter U is 
sufficiently greater than zero. For 
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then the hypothetical traces persist- 
ing from reinforced trials that might 
be imagined as preceding the actual 
beginning of the experiment would be 
almost totally deconditioned during 
all but the early trials of the learning 
experiment. 

Asymmetric generalization. It has 
sometimes been suggested that there 
exist asymmetries in which a general- 
ization going in one direction, e.g., 
S; — S,*, is more probable than a gen- 
eralization going in the reverse direc- 
tion, S, > S;*. At first glance, such 
a possibility appears to violate As- 
sumption III, according to which the 
probability of a displacement for a 
given trace element depends only 
upon the length (and not upon the 
direction) of that displacement. 
However, an account of such asym- 
metries which is consistent with As- 
sumption III is suggested by the 
analysis proposed by Bush and Mos- 
teller. According to their model, the 
psychological dissimilarity going from 
S; to S,; is equal to the dissimilarity 
going from S, to S; only if the “set 
of stimulus elements” comprising 5S; 
has the same measure as the set of 
stimulus elements comprising S; (5). 
In terms of the trace model, then, it 
can be supposed that the area of the 
region of psychological space des- 
ignated as S;* may be greater or 
smaller than the area of the region 
designated as S,*. Since the prob- 
ability that a given trace element 
(wandering at random) will occupy a 
certain region during an interval At is 
proportional to the area of that 
region, P45 does not necessarily equal 
P,;5. Indeed, the weight W,%, pro- 
posed earlier by Shepard (25), is 
presumably a measure of the area of 
the region corresponding to S;. This 
interpretation suggests why the em- 
pirically obtained weights tended to 





SHEPARD 


be greatest for stimuli or responses at 
the end of a linear array (26). 

Multidimensional generalization. 
For expository reasons the stimuli 
were supposed to vary along a re- 
stricted range of a single physical di- 
mension. Actually, by making use 
of the theory of random motions in 
Euclidean spaces of more than one 
dimension (6), it can be shown that 
the trace model leads to the same ex- 
ponential gradient in either of the 
two following multidimensional cases: 
(a) the psychological space is Euclid- 
ean; (b) the stimuli are confined to a 
small region of psychological space. 
(The second case follows from the 
first. For, by the hypothesized rela- 
tion between physical and psychologi- 
cal space, a sufficiently small region 
of even a non-Euclidean space will be 
approximately Euclidean.) The 
treatment of generalization over large 
distances in non-Euclidean spaces, 
however, awaits the development of a 
general theory of random motions in 
spaces of positive and negative curva- 
tures. 

Response generalization. The pre- 
ceding discussion has been formulated 
solely in terms of stimulus generaliza- 
tion. However, the same trace model 
can also be applied in the case of 
response generalization. Suppose, for 
example, that the responses are closely 
spaced along a single physical dimen- 
sion, whereas the stimuli are com- 
pletely discriminable. When one re- 
sponse R; is intended, some other 
response R, may actually be made. 
Using a notation analogous to that 
adopted for stimulus generalization, 
it can be said that R,* leads to R; with 
probability Px*®. If then R,* leads 
to R,, the ensuing reinforcement 
conditions a large number of trace 
elements from R;* to a point x; in the 
psychological space of the responses. 
These conditioned trace elements are 
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then subject to the rules already set 
forth in Assumptions | through IV. 


THE FORM OF THE GRADIENT AND 
THE SCHEDULE OF REINFORCEMENT 


In the last section the apparent 
arbitrariness of the specific function 
fitted to the paired-associate data of 
Fig. 2 was reduced by showing that 
this function is a consequence of 
elementary assumptions which do not 
involve the specification of any partic- 
ular function. The purpose of this 
section is to determine whether these 
same assumptions have any conse- 
quences for different types of experi- 
ments in which the sequence of rein- 
forced and nonreinforced trials is 
manipulated, as in the study by Hum- 
phreys (13). 

Certainly, from Fig. 4 it is clear 
that the sharp peak of the combined 
gradient at x =0 is_ contributed 


solely by the distributions of trace 


elements resulting from the most re- 
cent reinforcements (like the dis- 
tribution ¢_;). Therefore, if feedback 
as to the correctness of each response 
is terminated, the composite gradient 


should become rounded (convex up-. 


ward) in the vicinity of x = 0 and, 
under continued extinction, gradually 
flatten out. Likewise, if reinforce- 
ment is delivered only for very mth 
correct response, the composite gradi- 
ent should assume an intermediate 
form which tends more toward either 
an exponential or a bell-shaped curve 
as m is made smaller or larger. Con- 
siderations of this kind may account 
for the finding of Humphreys that, 
whereas with 100 per cent reinforce- 
ment the generalization gradient was 
concave upward, with 50 per cent 
reinforcement it was initially convex 
upward (13). 

It is also feasible to analyze data 
from  paired-associate experiments 
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Fic. 5. The gradient of generalization as 
a function of the number of trials intervening 
between a response and the last preceding 
feedback as to the correctness of that response. 
The numbers of intervening trials are grouped 
together as follows: I. 0 trials; II. 1-3 trials; 
III. 4-8 trials; IV. 9-15 trials; and V. 16 or 
more trials. 


with 100 per cent reinforcement for 
this effect. Since the stimuli are pre- 
sented in random order, the delay be- 
tween successive occurrences of a 
given stimulus or response varies over 
a considerable range. Thus separate 
gradients can be plotted depending on 
the number of trials intervening be- 
tween a given stimulus-response se- 
quence and the most recent feedback 
concerning the correctness of that 
particular sequence. The most ex- 
tensive data currently available in a 
form suitable for this analysis come 
from an experiment on response gen- 
eralization in paired-associate learning 
(26). Those data are therefore re- 
analyzed and plotted in Fig. 5. As 
predicted from the model, the gradient 
of generalization. systematically 
changes from concave to convex up- 
ward in the vicinity of the correct 
response as the number of trials inter- 
vening between a response and the 
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last differential reinforcement of that 
response is increased. 

These results may help to explain 
why the so-called ‘“‘discriminal dis- 
persion” observed in absolute-judg- 
ment and identification experiments 
seems to conform to a Gaussian or 
normal density function (9, pp. 317- 
319). Since differential reinforce- 
ments are not usually provided in 
these experiments, the discriminal dis- 
persion is presumably maintained by 
those haphazard reinforcements of 
everyday existence which antedate 
the beginning of the experiment 
proper. Under these circumstances 
the gradient must be relatively meso- 
kurtic, as illustrated in V of Fig. 5. 
This gradient resembles a Gaussian 
function, and is quite unlike the com- 
paratively leptokurtic gradients (such 
as I) which have been shaped by re- 
cent differential reinforcement. 


MICROMECHANICAL AND MACROME- 
CHANICAL MODELS FOR GENERAL- 
IZATION 


The present trace model might be 
termed a micromechanical model be- 
cause it is based upon assumptions 
about the fine-grain or “‘microscopic”’ 
mechanics of the underlying trace 
process. This model is to be dis- 
tinguished from an earlier macrome- 
chanical model derived from assump- 
tions pertaining to large-scale or 
“macroscopic” aspects of generaliza- 
tion (25). However, to the extent 
that the two models are compatible, 
they can be integrated so that the 
macromechanical assumptions of the 
earlier model will appear as deductions 
from the more primitive microme- 
chanical assumptions of the present 
model. The aim of such an integra- 
tion would be to broaden the scope of 
the earlier model. For, whereas the 
macromechanical model applies only 
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to paired-associate experiments with 
continual differential reinforcement, 
the micromechanical model has im- 
plications for a considerably wider 
range of experiments. 

The three basic assumptions from 
which the macromechanical model was 
originally derived are as follows. (a) 
Stimulus and response generalizations 
take place independently of each 
other. (6) The probability of a 
stimulus generalization is an expo- 
nential decay function of the psycho- 
logical distance between the stimuli. 
(c) The probability of a response gen- 
eralization is an exponential decay 
function of the psychological distance 
between the responses. Now to say 
that the scope of the earlier model will 
be increased by the adjunction of the 
micromechanical assumptions is to 
say that the macromechanical as- 
sumptions (a, 6, and c) will retain 
their original form only under cer- 
tain limiting conditions (e.g., under 
continual reinforcement). When 
these conditions are modified (as 
when the reinforcements are delivered 
only intermittently), Propositions } 
and c will have to assume different 
forms in accordance with the con- 
clusions of the last section. 

Even with continual reinforcement, 
there is one case of paired-associate 
learning for which the macromechani- 
cal assumptions would have a different 
form if deduced from the microme- 
chanical assumptions. Specifically, if 
the interstimulus distances and the 
interresponse distances are both quite 
small, the trace model does not lead 
directly to the exponential gradient. 
Because occasional sequences of the 
form S; —P S,* ti R,* cml R; will be 
reinforced, trace elements will be con- 
ditioned from S; to x, (rather than 
only to x;). Events of this kind will 
somewhat alter the form of the gradi- 
ent. (In a previous experiment with 
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generalization, both between the stim- 
uli and between the responses, no 
significant departure from prediction 
on the assumption of an exponential 
gradient was observed [26]. How- 
ever, the theoretically expected devi- 
ations would be quite small and may 
have been obscured by the rather 
large variability of the data from that 
experiment.) 

Detailed derivations from the mi- 
cromechanical assumptions are quite 
complicated in the case of simultane- 
ous stimulus and response generaliza- 
tion, owing to the circumstance that, 
in this case, the form of the gradient 
depends upon the particular stimulus- 
response assignment enforced. Part 
of this complication is connected with 
the absence in both models of any 
account of the decrease in generaliza- 
tion which necessarily accompanies 
learning. An entirely satisfactory 
treatment of these problems will 
probably require an even more basic 
integration of these models with the 
already extensively developed models 
for learning per se. 


SUMMARY 


The problem of the relation between 
generalization and dissimilarity (i.e., 
the problem of the shape of the 
“gradient of generalization’’) is re- 
examined in the light of recent the- 
oretical and empirical developments. 
With regard to experimental arrange- 
ments in which reinforcments are de- 
livered in accordance with a one-to- 
one assignment of the responses to the 
stimuli (as in paired-associate learn- 
ing), the following conclusions are 
drawn: 

1. Measures of generalization can 
be defined in terms of the conditional 
probabilities with which the various 
stimuli lead to the various responses. 

2. Thus defined, stimulus general- 
ization and response generalization are 
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both invariant functions of inter- 
stimulus and interresponse dissimilari- 
ties, respectively, provided that two 
conditions are met. First, dissimi- 
larity is reinterpreted to mean a 
“psychological distance’’ which (a) 
is equivalent to “physical distance”’ 
except for a continuous, differentiable 
transformation, and (b) satisfies the 
metric axioms. Second, a_ given 
schedule of reinforcement is main- 
tained. 

3. Under conditions of frequent 
and regular reinforcement (as in the 
typical paired-associate experiment), 
the gradient of generalization is 
closely approximated by an exponen- 
tial decay function (concave upward). 

4. Under conditions of infrequent 
or intermittent reinforcement, this 
gradient departs from the exponential 
function in that it is convex upward in 
the vicinity of the reinforced stimulus 
or response. 

5. The empirically observed gradi- 
ents of generalization can be deduced 
from a mathematical model based upon 
four elementary assumptions con- 
cerning the temporal decay of stimu- 
lus and response traces. 
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